Closed ghost closed 5 years ago
The discrepancy is due to the default assumption that this is a human genome. Please find other RepeatMasker-related issues in this repo and there are some solutions.
On Sat, Sep 21, 2019, 6:33 AM aderzelle notifications@github.com wrote:
Hi, Here are the count from the TE library genome.FLYE.sixLongest.fa.EDTA.TElib.fa
DNA/DTA 52 DNA/DTC 50 DNA/DTH 476 DNA/DTM 654 DNA/DTT 2722 DNA/Helitron 15 LTR/Gypsy 38 LTR/unknown 20 MITE/DTA 75 MITE/DTC 10 MITE/DTH 88 MITE/DTM 104 MITE/DTT 570
Then I ran RepeatMasker RepeatMasker genome.FLYE.sixLongest.fa -no_is -pa 8 -lib genome.FLYE.sixLongest.fa.EDTA.TElib.fa
Here is the summary
================================================== number of length percentage elements* occupied of sequence
Retroelements 1333 187637 bp 0.16 % SINEs: 20 1160 bp 0.00 % Penelope 63 3689 bp 0.00 % LINEs: 487 62803 bp 0.05 % CRE/SLACS 0 0 bp 0.00 % L2/CR1/Rex 12 561 bp 0.00 % R1/LOA/Jockey 23 2819 bp 0.00 % R2/R4/NeSL 0 0 bp 0.00 % RTE/Bov-B 50 23094 bp 0.02 % L1/CIN4 177 20812 bp 0.02 % LTR elements: 826 123674 bp 0.11 % BEL/Pao 105 7431 bp 0.01 % Ty1/Copia 2 131 bp 0.00 % Gypsy/DIRS1 256 55114 bp 0.05 % Retroviral 179 10844 bp 0.01 %
DNA transposons 2314 176348 bp 0.15 % hobo-Activator 689 43072 bp 0.04 % Tc1-IS630-Pogo 167 54954 bp 0.05 % En-Spm 0 0 bp 0.00 % MuDR-IS905 0 0 bp 0.00 % PiggyBac 18 2279 bp 0.00 % Tourist/Harbinger 249 12509 bp 0.01 % Other (Mirage, 24 1231 bp 0.00 % P-element, Transib)
Rolling-circles 77 8371 bp 0.01 %
Unclassified: 51 3907 bp 0.00 %
Total interspersed repeats: 367892 bp 0.32 %
Small RNA: 431 137483 bp 0.12 %
Satellites: 130 7935 bp 0.01 % Simple repeats: 48930 1869437 bp 1.61 % Low complexity: 9266 432567 bp 0.37 %
The number for the DNA transposons do not seem to match. For example, I have more DNA elements reported from the non-redundant EDTA output than from RepeatMasker, but I would expect the opposite since RepeatMasker should count the occurrence of each element. Or am I missing something?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/22?email_source=notifications&email_token=ABNX4NCEK2C2E6NTZQ7NVZDQKYBBRA5CNFSM4IY557WKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HM2HC3A, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNX4NHTWBCW4CVCYGBFFCDQKYBBRANCNFSM4IY557WA .
Solutions could be found in #8
Hi, Here are the count from the TE library
genome.FLYE.sixLongest.fa.EDTA.TElib.fa
Then I ran RepeatMasker
RepeatMasker genome.FLYE.sixLongest.fa -no_is -pa 8 -lib genome.FLYE.sixLongest.fa.EDTA.TElib.fa
Here is the summary
The number for the DNA transposons do not seem to match. For example, I have more DNA elements reported from the non-redundant EDTA output than from RepeatMasker, but I would expect the opposite since RepeatMasker should count the occurrence of each element. Or am I missing something?