oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
348 stars 73 forks source link

Debug lines found in EDTA.pl #2

Closed hsiaopei closed 5 years ago

hsiaopei commented 5 years ago

We tried to run the pipeline using our genome assembly fasta file, xxx.fa. Unfortunately, the error message showed up "xxx.fa.masked does not contain any sequences!" What's going on? Apparently, at line 48 of the code of EDTA.pl , "if (0){", should be changed to "if (1){".

oushujun commented 5 years ago

Hi @hsiaopei ,

Thanks for testing EDTA! Currently we encounter some issues with TIR-Learner, so the first module EDTA_raw.pl, as well as the EDTA.pl, are under a test mode. I am working with @weijiaweijia, the author of TIR-Learner and co-author of the EDTA manuscript, to resolve this issue. It should be fixed within this week. Sorry for the inconvenience.

Best, Shujun

oushujun commented 5 years ago

Dear @hsiaopei, @gabyrech, and @philippbayer,

Thank you for using EDTA. I just finished a major upgrade for the program including fixing the TIR-Learner issue. Please reinstall the EDTA package and help to test with your genomes. Please ignore the FASTA-Reader error (e.g. FASTA-Reader: Ignoring invalid residues at position(s): On line 252: 520-746) and cp rm errors for now. They are harmless to the result.

Please kindly let me know if you encounter any other errors. Thank you!

Best, Shujun

philippbayer commented 5 years ago

Thanks for the heads-up Shujun - I just pulled it and turned it on, it's currently running and will probably run for a few days (plant genome) :)

philippbayer commented 5 years ago

I've now have had it running for about 24 hours:

perl EDTA/EDTA.pl -genome ragoo.fasta -threads 15

currently it's hanging at this step:

perl <snip>/EDTA/bin/LTR_retriever/bin/LTR.identifier.pl ragoo.fasta -list ragoo.fasta.retriever.scn -seq ragoo.fasta.retriever.scn.extend.fa -anno ragoo.fasta.retriever.scn.extend.fa.aa.anno -flanksim 60 -flankmiss 25 -flankaln 0.6 -minlen 100 -u 1.3e-8 -threads 15 -blastplus /ws/00089503/anaconda/envs/EDTA/bin/ -motif TCCA TGCT TACA TACT TGGA TATA TGTA TGCA > ragoo.fasta.defalse

and is not consuming memory or CPU, and it hasn't written output in about 12 hours. The last file written was ragoo.fasta.defalse (my input is ragoo.fasta). Have you observed this before?

Last output in ragoo.fasta.defalse:

Chr0_RaGOO:14387060..14401218   false   motif:TGAA      TSD:TCAG        14387056..14387059      14401219..14401222      IN:14387558..14400719   0.9779  ?       unknown NA      862315
        Adjust: NO      lLTR: 498       rLTR: 499
        Alignment regions: 1, 498, 13661, 14158
        LTR coordinates: 14387060, 14387557, 14400720, 14401218
        TSD-LTR overlap: 0
        Boundary missing: 0
oushujun commented 5 years ago

Hi Philipp,

This sounds not right for me. If the defalse step was finished, you should be seeing a pass.list file, otherwise the LTR_retriever program is stuck in the defalse step. This could happen when the specified cpus are not made exclusive to the job (other jobs taking a lot of cpu resources), such that the multithreading module couldn't allocate new threads to tasks. If this is the case, the only way is to restart LTR_retriever or EDTA.

Best, Shujun

On Thu, Jun 20, 2019, 1:52 AM Philipp Bayer notifications@github.com wrote:

I've now have had it running for about 24 hours:

perl EDTA/EDTA.pl -genome ragoo.fasta -threads 15

currently it's hanging at this step:

perl /EDTA/bin/LTR_retriever/bin/LTR.identifier.pl ragoo.fasta -list ragoo.fasta.retriever.scn -seq ragoo.fasta.retriever.scn.extend.fa -anno ragoo.fasta.retriever.scn.extend.fa.aa.anno -flanksim 60 -flankmiss 25 -flankaln 0.6 -minlen 100 -u 1.3e-8 -threads 15 -blastplus /ws/00089503/anaconda/envs/EDTA/bin/ -motif TCCA TGCT TACA TACT TGGA TATA TGTA TGCA > ragoo.fasta.defalse

and is not consuming memory or CPU, and it hasn't written output in about 12 hours. The last file written was ragoo.fasta.defalse (my input is ragoo.fasta). Have you observed this before?

Last output in ragoo.fasta.defalse:

Chr0_RaGOO:14387060..14401218 false motif:TGAA TSD:TCAG 14387056..14387059 14401219..14401222 IN:14387558..14400719 0.9779 ? unknown NA 862315 Adjust: NO lLTR: 498 rLTR: 499 Alignment regions: 1, 498, 13661, 14158 LTR coordinates: 14387060, 14387557, 14400720, 14401218 TSD-LTR overlap: 0 Boundary missing: 0

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/2?email_source=notifications&email_token=ABNX4NGVFWHZZQTE55AHIYDP3MSLBA5CNFSM4HS4XUM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYEOUYQ#issuecomment-503900770, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNX4NCYWK4MNAKDJOQVPQ3P3MSLBANCNFSM4HS4XUMQ .

oushujun commented 5 years ago

Dear All,

Sorry for the delay of response. I just push a bulk update to EDTA and have tested it in different servers - it seems to work now. But I have not tested it in macOS, so some tiny differences could cause problems.

For testing purposes, please use a small file, ie. 20 Mb, for faster turn around. Please let me know if there are any issues.

Best, Shujun

oushujun commented 5 years ago

I consider this issue is fixed. Please reopen it if the problem is persistent, or open new issues if you found new problems.

Best, Shujun