Closed zmz1988 closed 2 years ago
Try to rename your sequences. Or reduce parameters one by one until you find which parameter is giving you issues.
Thanks! I tried using a shorter file name, but the error in annotation step still occurred, though it showed me that the EDTA final step is finished.
So I tried to just run annotation step again, as I already got the final TE library. In this case, if I use perl /home/EDTA/EDTA.pl --genome necat_ragtag.fasta --cds Araport11_genes.202106.cds.fasta --curatedlib TAIR10_TE.fasta --threads 10 --anno 1 --step anno
, will EDTA automatically picks up the .mod.EDTA.TElib.fa file for annotation?
If you are working on an Arabidopsis genome, I suggest starting from scratch by removing all EDTA-generated files or start from a new folder. Attempting to recover from previous errors for a small genome and the following suspects will further delay your analysis.
Please use shorter sequence names, not file names.
Thanks! I had tried (1) shortening the sequence name into only three letters, (2) running EDTA on fasta file with reducing size (containing only one chromosome), and (3) reducing the parameters with only --anno and threads perl /home/EDTA/EDTA.pl --genome necat_ragtag.fasta --cds Araport11_genes.202106.cds.fasta --curatedlib TAIR10_TE.fasta --threads 10 --anno 1
. But problem still remains, always can't find 'necat_ragtag.fasta.mod.EDTA.intact.fa.rename'... Unfortunately, I still got the same message...
I'm wondering what could cause the .fasta.mod.EDTA.intact.fa.rename file failed to build?
Can you share a small sequence sample that reproduces the issue?
Thanks! Shujun
On Tue, Sep 14, 2021 at 6:30 PM zzz @.***> wrote:
Thanks! I had tried (1) shortening the sequence name into only three letters, (2) running EDTA on fasta file with reducing size (containing only one chromosome), and (3) reducing the parameters with only --anno and threads perl /home/EDTA/EDTA.pl --genome necat_ragtag.fasta --cds Araport11_genes.202106.cds.fasta --curatedlib TAIR10_TE.fasta --threads 10 --anno 1 . But problem still remains, always can't find 'necat_ragtag.fasta.mod.EDTA.intact.fa.rename'... Unfortunately, I still got the same message...
I'm wondering what could cause the .fasta.mod.EDTA.intact.fa.rename file failed to build?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/220#issuecomment-919582283, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NBW6PCZDHC6LLQRO2TUB7LJNANCNFSM5DY43MHQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Sure. I already send it by email to you. Could you please have a check? Thanks in advance!
Thanks for sharing your files. It appears that the curated library you provided is not a TE library, but seems like all TEs annotated in the genome. This is not recommended and not the design of this function. You may only provide non-redundant exemplar sequences via --curatedlib
. Nevertheless, this is not the direct cause of your issue.
The real cause is the naming of these sequences, which do not follow the RepeatMasker naming convention. You may check out the libraries included in EDTA/database
and mimic the naming formats. For example, the sequence >AT1TE52125|-|15827287|15838845|ATHILA2|LTR/Gypsy|11559 bp
can be formatted as >AT1TE52125#LTR/Gypsy
. If you don't know any classification information (which in this case you may not include it in the curated library), you can put something as ambiguous as Unknown_00001#unknown/unknown
.
Shujun
Thanks a lot, @oushujun! I should've checked the TE file more carefully!
I got the TE file from this question [https://github.com/oushujun/EDTA/issues/198], and thought probably I can use it directly. I will try to change the TE group naming in this file as you suggested!
Many thanks agin!!!
You may search for a true Ath TE database. This is not a TE library. I think it's somewhere in TAIR or Arapoart, or at least the repbase version is close enough.
Shujun
On Thu, Sep 16, 2021 at 5:31 AM zzz @.***> wrote:
Thanks a lot, @oushujun https://github.com/oushujun! I should've checked the TE file more carefully!
I got the TE file from this question [https://github.com//issues/198 https://github.com/oushujun/EDTA/issues/198], and thought probably I can use it directly. I will tried to change the TE group naming in this file as you suggested!
Many thanks agin!!!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/220#issuecomment-920782181, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NCKAOG2F45EIKX5ZY3UCHBPRANCNFSM5DY43MHQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Dear developers,
Thanks for developing this awesome tool! I have tried to run it recently and encountered an error message: ERROR: RepeatMasker results not found in necat_ragtag.fasta.mod.out!'
My code is: 'perl /home/EDTA/EDTA.pl --genome necat_ragtag.fasta --cds Araport11_genes.202106.cds.fasta --curatedlib TAIR10_TE.fasta --overwrite 1 --sensitive 1 --evaluate 1 --threads 10 --anno 1'
Please see below in part of the log file:
'2021-09-10 04:21:53,060 -INFO- generating gene anntations 2021-09-10 04:21:56,808 -INFO- 648 sequences classified by HMM 2021-09-10 04:21:56,809 -INFO- see protein domain sequences in
Araport11_genes.202106.cds.fasta.code.rexdb.dom.faa
and annotation gff3 file inAraport11_genes.202106.cds.fasta.code.rexdb.dom.gff3
2021-09-10 04:21:56,809 -INFO- classifying the unclassified sequences by searching against the classified ones 2021-09-10 04:21:58,380 -INFO- using the 80-80-80 rule 2021-09-10 04:21:58,380 -INFO- run CMD:makeblastdb -in ./tmp/pass1_classified.fa -dbtype nucl
2021-09-10 04:21:58,529 -INFO- run CMD:blastn -query ./tmp/pass1_unclassified.fa -db ./tmp/pass1_classified.fa -out ./tmp/pass1_unclassified.fa.blastout -outfmt '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen qcovs qcovhsp sstrand' -num_threads 10
2021-09-10 04:22:09,237 -INFO- 111 sequences classified in pass 2 2021-09-10 04:22:09,237 -INFO- total 759 sequences classified. 2021-09-10 04:22:09,237 -INFO- see classified sequences inAraport11_genes.202106.cds.fasta.code.rexdb.cls.tsv
2021-09-10 04:22:09,237 -INFO- writing library for RepeatMasker inAraport11_genes.202106.cds.fasta.code.rexdb.cls.lib
2021-09-10 04:22:10,929 -INFO- writing classified protein domains inAraport11_genes.202106.cds.fasta.code.rexdb.cls.pep
2021-09-10 04:22:11,425 -INFO- Summary of classifications: Order Superfamily # of Sequences# of Clade Sequences # of Clades# of full Domains LTR Bel-Pao 7 0 0 0 LTR Copia 97 75 13 0 LTR Gypsy 167 125 16 0 LTR Retrovirus 1 0 0 0 LTR mixture 6 0 0 0 Penelope unknown 4 0 0 0 LINE unknown 119 0 0 0 TIR EnSpm_CACTA 2 0 0 0 TIR MuDR_Mutator 64 0 0 0 TIR PIF_Harbinger 17 0 0 0 TIR hAT 43 0 0 0 Helitron unknown 3 0 0 0 Maverick unknown 224 0 0 0 mixture mixture 5 0 0 0 2021-09-10 04:22:11,427 -INFO- Pipeline done. 2021-09-10 04:22:11,427 -INFO- cleaning the temporary directory ./tmpmv: cannot stat 'necat_ragtag.fasta.mod.EDTA.intact.fa.rename': No such file or directory Fri 10 Sep 04:31:23 BST 2021 Homology-based annotation of TEs using necat_ragtag.fasta.mod.EDTA.TElib.fa from scratch.
ERROR: RepeatMasker results not found in necat_ragtag.fasta.mod.out!'
I had ran the test file in the EDTA folder, and everything went well with all the annotation outputs. But the run with my own data seems stuck at the annotation step, though I have the RepeatMasker and RepeatModeler installed. I'm not sure where got wrong. Could you please help?
Thanks!