oushujun / LTR_retriever

LTR_retriever is a highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813529/
GNU General Public License v3.0
176 stars 40 forks source link

RepeatMasker is not running properly #121

Closed qingyun-mei closed 1 year ago

qingyun-mei commented 2 years ago

Hello Shujun Ou: when I run the LTR_retriever, It happened errors as follows: ERROR: RepeatMasker is not running properly! Please check the file Genome.fa.ltrTE.mask.lib and Genome.fa.ltrTE.trunc and test run: RepeatMasker -e ncbi -q -pa 20 -no_is -norna -nolow -div 40 -lib Genome.fa.ltrTE.mask.lib -cutoff 225 Genome.fa.ltrTE.trunc In addition, I checked the files and found Genome.fa.ltrTE.trunc file was empty. Could you give some suggestion? Thanks

oushujun commented 2 years ago

Hello,

Can you run the following and report back: grep trunc Genome.fa.mod.defalse | wc -l

Thanks, Shujun

qingyun-mei commented 2 years ago

Thanks for your help. The result is as follows: grep trunc Genome.fa.defalse|wc -l 7322

oushujun commented 2 years ago

That looks good to me. Can you try to rerun it or list the size of files in the work folder?

On Tue, Apr 12, 2022 at 8:58 PM qingyun-mei @.***> wrote:

Thanks for your help. The result is as follows: grep trunc Genome.fa.defalse|wc -l 7322

— Reply to this email directly, view it on GitHub https://github.com/oushujun/LTR_retriever/issues/121#issuecomment-1097440772, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NBJXBYIHI3JCOMZZNDVEYLZTANCNFSM5THSXWBA . You are receiving this because you commented.Message ID: @.***>

qingyun-mei commented 2 years ago

Hello shujun: I have rerun it many times and the result is always same. The size of output file is as follows. Thanks very much!

Best regards!

At 2022-04-13 11:50:55, "Shujun Ou" @.***> wrote:

That looks good to me. Can you try to rerun it or list the size of files in the work folder?

On Tue, Apr 12, 2022 at 8:58 PM qingyun-mei @.***> wrote:

Thanks for your help. The result is as follows: grep trunc Genome.fa.defalse|wc -l 7322

— Reply to this email directly, view it on GitHub https://github.com/oushujun/LTR_retriever/issues/121#issuecomment-1097440772, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NBJXBYIHI3JCOMZZNDVEYLZTANCNFSM5THSXWBA . You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

oushujun commented 2 years ago

Hello @qingyun-mei,

I didn't see the size of files. Can you share it again? Thanks.

Shujun

qingyun-mei commented 2 years ago

Sorry, maybe there are some errors in picture. The size is as follows: 2.1G Genome.fa 4.5M Genome.fa.finder.combine.scn 4.9M Genome.fa.retriever.scn 4.9M Genome.fa.retriever.scn.list 2.4M Genome.fa.retriever.scn.full 498M Genome.fa.ltrTE.fa 472M Genome.fa.ltrTE.stg1 149K Genome.fa.ltrTE.fa.cleanup 2.3M Genome.fa.retriever.scn.extend 477M Genome.fa.retriever.scn.extend.fa 965M Genome.fa.retriever.scn.extend.fa.aa 32M Genome.fa.retriever.scn.extend.fa.aa.tbl 84M Genome.fa.retriever.scn.extend.fa.aa.scn 4.8M Genome.fa.retriever.scn.extend.fa.aa.anno 17M Genome.fa.defalse 9.2M Genome.fa.retriever.scn.adj 2.8M Genome.fa.ltrTE.pass.list 2.9M Genome.fa.LTRID.list 190M Genome.fa.ltrTE.pass 190M Genome.fa.LTRlib.raw 308K Genome.fa.ltrTE.trunc.list 0 Genome.fa.ltrTE.stg2 3.6M Genome.fa.ltrTE.pass.clust.clstr 665K Genome.fa.retriever.scn.adj.list 31K Genome.fa.ltrTE.veryfalse.list 9.5K Genome.fa.ltrTE.veryfalse 0 Genome.fa.ltrTE.trunc 2.0M Genome.fa.ltrTE.veryfalse.fa 2.0M Genome.fa.ltrTE.mask.lib

Thanks!

Hello @qingyun-mei,

I didn't see the size of files. Can you share it again? Thanks.

Shujun

oushujun commented 2 years ago

Hello,

What version of LTR_retriever are you using? Can you try this: cd-hit-est -i Genome.fa.ltrTE.pass -o Genome.fa.ltrTE.pass.clust -c 0.8 -G 0.8 -s 0.9 -aL 0.9 -aS 0.9 -M 0 -T $threads

Thanks, Shujun

qingyun-mei commented 2 years ago

Hello shujun: Thanks for your reply. I run 2.8.7 and 2.9.0 version. The result was same whenever it was installed by conda or standard installation. when I try: cd-hit-est -i Genome.fa.ltrTE.pass -o Genome.fa.ltrTE.pass.clust -c 0.8 -G 0.8 -s 0.9 -aL 0.9 -aS 0.9 -M 0 -T $threads, it seems normal. I copy tail lines as follows.

comparing sequences from 60292 to 60466 ...................---------- new table with 57 representatives comparing sequences from 60466 to 60626 97.6%---------- new table with 2 representatives comparing sequences from 60626 to 60773 ........---------- new table with 17 representatives comparing sequences from 60773 to 60907 ...................---------- new table with 25 representatives comparing sequences from 60907 to 61030 ...................---------- new table with 28 representatives comparing sequences from 61030 to 61143 ...................---------- new table with 35 representatives comparing sequences from 61143 to 61246 .................---------- new table with 29 representatives comparing sequences from 61246 to 61341 ...................---------- new table with 18 representatives comparing sequences from 61341 to 61428 ...............---------- new table with 16 representatives comparing sequences from 61428 to 62390 .....................---------- new table with 130 representatives

62390  finished       8787  clusters

Approximated maximum memory consumption: 550M writing new database writing clustering information program completed !

Total CPU time 12144.65

oushujun commented 2 years ago

You may want to test LTR_retriever with a small, simple fasta file to make sure it's running properlly.

wrengs commented 2 years ago

Dear Oushujun & Qingyun-mei,

Sorry to comment on this existing issue, but I am experiencing the same error.

I previously successfully ran LTR_retriever on a 833 Mb genome.

Next, I repeated the workflow on a different genome, 871.9 Mb in size (sequence names <12 characters)

grep trunc Moneyberg_Hifiasm+Canu.fasta.defalse | wc -l 1883

I got the following output & error:

Tue Apr 26 13:08:26 CEST 2022 Dependency checking: All passed! Tue Apr 26 13:08:41 CEST 2022 LTR_retriever is starting from the Init step. Tue Apr 26 13:08:44 CEST 2022 Start to convert inputs... Total candidates: 19727 Total uniq candidates: 18673

Tue Apr 26 13:08:50 CEST 2022 Module 1: Start to clean up candidates... Sequences with 10 missing bp or 0.8 missing data rate will be discarded. Sequences containing tandem repeats will be discarded.

Tue Apr 26 13:23:29 CEST 2022 17574 clean candidates remained

Tue Apr 26 13:23:29 CEST 2022 Modules 2-5: Start to analyze the structure of candidates... The terminal motif, TSD, boundary, orientation, age, and superfamily will be identified in this step.

Tue Apr 26 14:14:45 CEST 2022 Intact LTR-RT found: 4722

Tue Apr 26 14:18:59 CEST 2022 Module 6: Start to analyze truncated LTR-RTs... Truncated LTR-RTs without the intact version will be retained in the LTR-RT library. Use -notrunc if you don't want to keep them.

Tue Apr 26 14:18:59 CEST 2022 1782 truncated LTR-RTs found ERROR: RepeatMasker is not running properly! Please check the file Moneyberg_Hifiasm+Canu.fasta.ltrTE.mask.lib and Moneyberg_Hifiasm+Canu.fasta.ltrTE.trunc and test run: RepeatMasker -e ncbi -q -pa 20 -no_is -norna -nolow -div 40 -lib Moneyberg_Hifiasm+Canu.fasta.ltrTE.mask.lib -cutoff 225 Moneyberg_Hifiasm+Canu.fasta.ltrTE.trunc Please report errors to https://github.com/oushujun/LTR_retriever/issues Program halt!

I test ran RepeatMasker with the suggested settings ( Search Engine: NCBI/RMBLAST [ 2.11.0+ ] ) and it successfully finished in 170 batches (attached). 20220502_repeatmasker_nohup.txt

Next, I tried re-running LTR_retriever, unfortunately without success as the same error occurred.

Do you have any suggestions where to start looking for a possible solution?

Many thanks in advance!

Kind regards, Willem

oushujun commented 2 years ago

Hi Willem,

Can you test your LTR_retriever version with this toy genome: https://github.com/oushujun/EDTA/blob/master/test/genome.fa

Shujun

wrengs commented 2 years ago

Hi Shujun,

Many thanks for the quick reply. I ran the LTR_retriever workflow (as suggested on https://github.com/oushujun/LTR_retriever#usage )

LTR_retriever ran through the 8 modules and finished without any errors. Downstream LAI (beta 3.2) also finished analysis, although no LAI was produced due to too low LTR sequence content.

Please find attached the nohup output file. LTR_retriever_nohup.txt

Does this match what is expected for the toy genome?

Kind regards, Willem

oushujun commented 2 years ago

Hi Willem,

Yes if it finished without error on the toy genome, it should be working properly. You may want to check your genome for any potential issues.

Shujun

On Tue, May 3, 2022 at 5:10 AM Willem van Rengs @.***> wrote:

Hi Shujun,

Many thanks for the quick reply. I ran the LTR_retriever workflow (as suggested on https://github.com/oushujun/LTR_retriever#usage )

LTR_retriever ran through the 8 modules and finished without any errors. Downstream LAI (beta 3.2) also finished analysis, although no LAI was produced due to too low LTR sequence content.

Please find attached the nohup output file. LTR_retriever_nohup.txt https://github.com/oushujun/LTR_retriever/files/8609466/LTR_retriever_nohup.txt

Does this match what is expected for the toy genome?

Kind regards, Willem

— Reply to this email directly, view it on GitHub https://github.com/oushujun/LTR_retriever/issues/121#issuecomment-1115886004, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NEXEAANJNS3BX3MJILVIDURPANCNFSM5THSXWBA . You are receiving this because you commented.Message ID: @.***>