oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
335 stars 73 forks source link

The step of ProcessRepeats cost long time #253

Closed zhangzhiyangcs closed 2 years ago

zhangzhiyangcs commented 2 years ago

Hi, When I run the EDTA pipeline by singularity. I stagnate at the step of ProcessRepeats for several days. The command is "perl /opt/conda/share/RepeatMasker/ProcessRepeats -lib genome_chr.fasta.mod.EDTA.TElib.fa -orifile genome_chr.fasta.mod -maskSource genome_chr.fasta.mod genome_chr.fasta.mod.cat.gz". My genome size is about 900Mb, but the genome_chr.fasta.mod.cat.gz file is about 3.7Gb. While genome_chr.fasta.mod.EDTA.TElib.fa and genome_chr.fasta.mod files are 7.0 Mb and 940 Mb repectively. Is there any wrong? Do you have any advice to accelerate it. Thanks

oushujun commented 2 years ago

Hi, it should not take this long but it happens. Please just wait. There is nothing I can do about it. You may want to start a new RepearMasker test With these files to figure it out.

Best, Shujun

On Sat, Jan 29, 2022 at 7:53 AM zhangzhiyangcs @.***> wrote:

Hi, When I run the EDTA pipeline by singularity. I stagnate at the step of ProcessRepeats for several days. The command is "perl /opt/conda/share/RepeatMasker/ProcessRepeats -lib genome_chr.fasta.mod.EDTA.TElib.fa -orifile genome_chr.fasta.mod -maskSource genome_chr.fasta.mod genome_chr.fasta.mod.cat.gz". My genome size is about 900Mb, but the genome_chr.fasta.mod.cat.gz file is about 3.7Gb. While genome_chr.fasta.mod.EDTA.TElib.fa and genome_chr.fasta.mod files are 7.0 Mb and 940 Mb repectively. Is there any wrong? Do you have any advice to accelerate it. Thanks

— Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/253, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NDRKDDHJRKH3GMEVB3UYPPMBANCNFSM5NC3RDCA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

zhangzhiyangcs commented 2 years ago

Thanks your advise, I have start a new test for these files. Once I have the results, I'll tell you.

zhangzhiyangcs commented 2 years ago

Happy Spring Festival, I get my results after 5 days. The total interspersed repeat elements is about 70% and the CACTA repeat elements is about 24%. Thanks a lot