Closed LeeYEAH2 closed 7 months ago
Chr01:100489068..100485858#LTR/unknown
Chr01:100991415..100985473#LTR/Retrovirus
Chr01:102009458..102011936#LTR/unknown TGTGGGGGCGGAGCCCCAGAGAGTAGTTTCCAGGCTCTCGGCCTCACACAGACAGGTGCTGGCTCAGGTAGTAAATGGCCAACTGTGATTGCATGGCCATCAGCTGTGGCTAGTTGGCCGTCAGCTGTAACCAGTGAGCCATTGGCCACAATATAATTGCTGTGGCTAAGGAGAGAGAGAAAGAAGGATGGGGCTAGCAAGGAGATGGCGGCTGGGCTGGCAAGCGTGGATGGCGGTTTGCAGACAGTGTGTATCCAGCCTCCAGTGAGAGTATAGTGCCGCCAGAGAGAATAAAGTGGTATGACTCCCCTACCTATGGCTCCGTGGGTGTTCCTTTTTGGCCTCACCATATCCTGCGTTCTTGTGTGGGGAGCGGGACCGGAGACCCTGCAGGCCACCCCGCATGACACATGGCGCAGCGAGCAGGGTCCCCAACATGACACATGGCGTAGTCGGCAGGATATGGTGCCGGCCAAAGCTCTCCGAAGGGCGGTGGAGCAGTTTGTGTGTATGAACACTCAGTCTGAGGAAGACCAGGAGGAGCAGCTGCCGGAGAGCTGGACCCTCGTGGAGGGGTGGGAGGACGTGGACGGTTCTCCCACCAGCACAGGAAAAGCCATGCAGCTGCTGGAGAGATGGAGCCCCGTGGAGAGGTGGAAAGATGTGGACGGTTCCCCAACCAGCACAGGCCGGAGAAAGCGAAGGTCGTTGCAGCCCTGTGGGCCGGGGAAGTTCCTGCTCAGGCAGCCCGGATGCAGGACTTGTAGTCCCAGGAGGTAACGCTTGCTGAGTCCTCCGTGGGAGATGAGGGTGAGGTCAAGGTCGTCCCTCACCCCCAGGACAGCCCTGGTGAATGACTATGGACTATGGGGAATTGCCTTCCATCCCTAATTTAATGGACTGCTTGACTGTTTGTTTGGGAACTGTTGTTAGTGGAACTGGGGGATATTTGCTTTTGTCTCTTGACTGGCCGCCATTGAGAATATGTAAGCACCTTGATTGTGAGTCGCTGTTGTTCCAGCAGGGTACCCTGAGAGGCACAGAGAGAGTGGCAGTGCGCTGAGAGGTCTAGCTGTGCCCTGAGAAGCCCTGGCTGTGTCCAGGAAGTGTGGCTGTGCCCACAGAGAGTGGTGGTGCCCTGAGAAGCCCTGGCTGTGTCCAGGAAGTGTGGCTGTTCCCAGAGAAAGTGGCAGTGCCCTGAGAAATCCTAGCTGTGCCCGGGGAAACTAGTGGTACCCTAAGAAGTCCAGGAAGTCTGGCAGTGCCCAGGAGTACTGGTGGTGCCCTGAGAAACCCTGGCTGTGTCCGGGAAGACAGGTGGTACCCTAAGAAGTCCAGGAAGTCTGGCAGTGCCCAGGAGTACTGGTGGTGCCCTGAGAAACCCTGGCTGTGTCCGGGAAGACAGGTGGTACCCTAAGAAGTCCAGGAAGTCTGGCTGTGCCCAGGAGTACTGGTGGTGCCCTGAGAAACCCTGGCTGTGTCCGGGAAGACAGGTGGTACCCTAAGAAGTCCAGGAAGTCTGGCAGTGCCCAGGAGTACTGGTGGTGCCCTGAGAAACCCTGGCTGTGTCCGGGAAGACAGGTGGTACCCTAAGAAGTCCAGGAAGTCTGGCTGTGCCCAGGAGTACTGGTGGTGCCCTGAGAAACCCTGGCTGTGTCCGGGAAGACAGGTGGTACCCTAAGAAGTCCAGGAAGTCTGGCAGTGCCCAGGAGTACTGGTGGTGCCCTGAGAAACCCTGGCTGTGTCCGGGAAGACAGGTGGTACCCTAAGAAGTCCAGGAAGTCTGGCAGTGCCCAGGAGTACTGGTGGTGCCCTGAGAAACCCTGGCTGTGTCCGGGAAGACAGGTGGTACCCTAAGAAGTCCAGGAAGTCTGGCCGTGCCCAGAAGCACTGGTGGTGCCCTGAGAAACCCTGGCTGTGCCCAGAGAGCCTGGCTGTGTCCAGGATATTGCATTCACCCCCAGGCTCCTCGCACAGGTCCCCTCGCGGAAGACGCTTGTCGCGTAGATTGTGAGGCGAGAGCCTGTAGGGGTGGAGTGTGGGGGCGGAGCCCCAGAGAGTAGTTTCCAGGCTCTCGGCCTCACATAGACAGGTGCTGGCTCAGGTAGTAAATGGCCAACTGTGATTGCATGGCCATCAGCTGTGGCTAGTTGGCCGTCAGCTGTAACCAGTGAGCCATTGGCCACAATATAATTGCTGTGGCTAAGGAGAGAGAGAAAGAAGGATGGGGCTAGCAAGGAGATGGCGGCTGGGCTGGCAAGCGTGGATGGCGGTTTGCAGACAGTGTGTATCCAGCCTCCAGTGAGAGTATAGTGCCGCCAGAGAGAATAAAGTGGTATGACTCCCCTACCTATGGCTCCGTGGGTGTTCCTTTTTGGCCTCACCATATCCTGCGTTCTTGTGTGGGGAGCGGGACCGGAGACCCTGCAGGCCACCCCGCATGACACATGGCGCAGCGAGCAGGGTCCCCAACATGACA
Does get_full_seqs automatedly filter some low quality sequence? Or would it be propriate for me to generate fasta sequences accodring to the 2 pass.list files by myself, then using TEsorter to classify them?
get_full_seqs
does not filter some low quality sequence, but it indeed discards sequences with the same location (see the below example). Surely you can generate fasta sequences accodring to the 2 pass.list files by yourself.
The below is count for an example:
$ wc -l *pass.list
622 genome.fasta.mod.nmtf.pass.list
16305 genome.fasta.mod.pass.list
16927 total
$ grep -c ">" intact_ltr.fa
16296
$ cat *pass.list | cut -f1 | grep -v "#" | sort |uniq | wc -l
16296
Yep, the empty sequence location does overlap with existing sequence, I guess it's the problem. thx a lot
Hello again! I was pudating my genome erv annotation due to the version update of the genome. BUT this time I found get_full_seqs didn't generate all the sequences LTR_retriever showed in the pass.list。 AND I'm pretty sure that get_full_seqs detected all the sequences. as shown below :