rmhubley / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
214 stars 48 forks source link

get L1 from the genome #125

Open XuexueLiu opened 2 years ago

XuexueLiu commented 2 years ago

Hi, I am using RepeatMasker to get L1 families in the reference genome "RepeatMasker GCF.fa" While I got a lof of L1 in the .out file, the size from 50bp-5k, I want to classfy subfamily of L1 by RepeatModeler, after extract the L1 fasta sequence from the genome, finally I got 100+ subfamilies also with the consense size from 50-5k. Here I want to have a ask that how can I get the consense family of L1 also the fasta file from the whole genome. Thankyou in advance Best Xue

jebrosen commented 2 years ago

I am not confident that I understand this analysis or your next goals. Can you include some more details so we can best answer your question?

The .out file should already include subfamily designations, so what is the purpose of the RepeatModeler run on the L1 instances? In the end, are you trying to obtain the consensus sequences of the L1 elements that are already known in RepeatMasker's libraries, or the L1 elements in a new genome?

XuexueLiu commented 2 years ago

Thankyou for your reply, I want to get subfamilies of L1M5 actually, so I first extract all the L1M5 fasta sequences from the reference genome(from NCBI), then use RepeatModeler to classfy them to different subfamilies, and also the consense sequences of each subfamilies. the problem for me now is the length of each subfamily varies. Thankyou for your suggestions Best Xue