rmhubley / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
214 stars 48 forks source link

the assumed repeatmasker installation directory does not appear to be correct #187

Closed chelisa closed 1 year ago

chelisa commented 1 year ago

image

but, the repeatmasker directory has a Library subdirectory?

image

rmhubley commented 1 year ago

You cannot provide a *.h5 to RepeatMasker using the "-lib" option. Currently "-lib" only supports FASTA and HMM formatted files. Are you simply trying to run RepeatMasker with the default distributed set of libraries? I am nots sure which organism "pelo_genomic" represents but the typical command line would (using the distributed libraries) would look something like this:

./RepeatMasker -species "my species" pelo_genomic.fa

If you have a custom library of repeats ( in FASTA or HMM format ) then you would use "-lib" instead of "-species":

./RepeatMasker -lib mylib.fa pelo_genomic.fa

Be careful with the use of "-nolow" as this doesn't simply exclude simple repeats from the output but avoids running any searches for tandem/simple repeats at all. By not searching for tandem/simple repeats in a competitive fashion with the TE sequences you greatly increase your false positive matching to TE sequences. Ie. a poly-A tandem sequence might incorrectly get labeled as a LINE or SINE sequence in primates etc. For users who simply don't want to see simple/tandem annotations in the output I recommend just filtering them out afterwards ( e.g cat results.fa.out | grep -v "Simple_repeat" > filtered_results.out ).

Let me know if you still have any questions.