oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
342 stars 73 forks source link

can't find LTR and NO SINE on Split Genome #510

Open rr9002 opened 1 week ago

rr9002 commented 1 week ago

First of all, thank you for providing such an excellent tool for TE annotation. I’m currently using EDTA v2.2.1 to annotate transposable elements for a large genome, GCA_014155895.2 (~16G). Due to its size, I’ve split the genome by chromosomes into 9 parts, each processed separately with EDTA. Below is the command I’m using for each split:

perl ../EDTA/EDTA.pl --genome new.part_002.fasta --species others --step all --overwrite 0 --sensitive 0 --anno 0 --evaluate 0 --u 1.3e-8 --threads 32 --force 1

I encountered the following issues:

  1. Issue with LTR detection: When running EDTA on chromosome 2 (new.part_002.fasta, 2.0G), I received an error, and no LTR elements were detected. Could you please advise on why this may be happening for this specific chromosome?
The start time is: 2024-10-09 21:29:56 
My job ID is: 15283037 
The total cores is: 64 
The hosts is: 
i05r3n18:64

#########################################################
##### Extensive de-novo TE Annotator (EDTA) v2.2.1  #####
##### Shujun Ou (shujun.ou.1@gmail.com)             #####
#########################################################

Parameters: --genome new.part_002.fasta --species others --step all --overwrite 0 --sensitive 0 --anno 0 --evaluate 0 --u 1.3e-8 --threads 32 --force 1

Wed Oct  9 21:30:14 CST 2024    Dependency checking:
                All passed!

Wed Oct  9 21:31:50 CST 2024    The longest sequence ID in the genome contains 87 characters, which is longer than the limit (13)
                Trying to reformat seq IDs...
                Attempt 1...
Wed Oct  9 21:32:24 CST 2024    Seq ID conversion successful!

Wed Oct  9 21:32:24 CST 2024    Obtain raw TE libraries using various structure-based programs: 

Wed Oct  9 21:32:24 CST 2024    EDTA_raw: Check dependencies, prepare working directories.

Wed Oct  9 21:32:49 CST 2024    Start to find LTR candidates.

Wed Oct  9 21:32:49 CST 2024    Identify LTR retrotransposon candidates from scratch.

Out of memory!
Out of memory!
cat: new.part_002.fasta.mod.harvest.combine.scn: No such file or directory
cat: new.part_002.fasta.mod.finder.combine.scn: No such file or directory
grep: new.part_002.fasta.mod.retriever.scn: No such file or directory
Argument "" isn't numeric in numeric gt (>) at /work/home/acbirxa1yd/miniconda3/envs/EDTA2/share/LTR_retriever/LTR_retriever line 380.

ERROR: No candidate is found in the file(s) you specified.

awk: fatal: cannot open file `new.part_002.fasta.mod.pass.list' for reading: No such file or directory
Warning: LOC list - is empty.

    perl rename_LTR_skim.pl target_sequence.fa LTR_retriever.defalse

Error: Error while loading sequence
Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file.
    Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv
    Author: Shujun Ou (shujun.ou.1@gmail.com) 10/11/2019

mv: cannot stat 'new.part_002.fasta.mod.LTR.intact.fa.ori.dusted.cln.cln': No such file or directory
mv: cannot stat 'new.part_002.fasta.mod.LTR.intact.fa.ori.dusted.cln.cln.list': No such file or directory
cp: cannot stat 'new.part_002.fasta.mod.LTR.intact.raw.fa.anno.list': No such file or directory
ERROR: No such file or directory at /work/home/acbirxa1yd/renhongbin/EDTA/util/output_by_list.pl line 39.

    perl filter_gff3.pl file.gff3 file.list > new.gff3

Wed Oct  9 21:35:32 CST 2024    Warning: The LTR result file has 0 bp!

Wed Oct  9 21:35:32 CST 2024    Start to find SINE candidates.

Thu Oct 10 03:26:20 CST 2024    Finish finding SINE candidates.

Thu Oct 10 03:26:20 CST 2024    Start to find LINE candidates.

Thu Oct 10 03:26:20 CST 2024    Existing result file new.part_002.fasta.mod-families.fa found!
                Will keep this file without rerunning this module.
                Please specify --overwrite 1 if you want to rerun this module.

Thu Oct 10 03:26:30 CST 2024    Finish finding LINE candidates.

Thu Oct 10 03:26:30 CST 2024    Start to find TIR candidates.

Thu Oct 10 03:26:30 CST 2024    Identify TIR candidates from scratch.

Species: others
Thu Oct 10 16:34:43 CST 2024    Finish finding TIR candidates.

Thu Oct 10 16:34:43 CST 2024    Start to find Helitron candidates.

Thu Oct 10 16:34:43 CST 2024    Existing result file new.part_002.fasta.mod.Helitron.intact.raw.fa found!
                Will keep this file without rerunning this module.
                Please specify --overwrite 1 if you want to rerun this module.

Thu Oct 10 16:34:43 CST 2024    Finish finding Helitron candidates.

Thu Oct 10 16:34:43 CST 2024    Execution of EDTA_raw.pl is finished!

Thu Oct 10 16:34:43 CST 2024    Obtain raw TE libraries finished.
                All intact TEs found by EDTA: 
                    new.part_002.fasta.mod.EDTA.intact.raw.fa 
                    new.part_002.fasta.mod.EDTA.intact.raw.gff3

Thu Oct 10 16:34:43 CST 2024    Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library: 

Thu Oct 10 16:35:50 CST 2024    EDTA advance filtering finished.

Thu Oct 10 16:35:50 CST 2024    Perform EDTA final steps to generate a non-redundant comprehensive TE library.

                Skipping the RepeatModeler results (--sensitive 0).
                Run EDTA.pl --step final --sensitive 1 if you want to add RepeatModeler results.

                Skipping the CDS cleaning step (--cds [File]) since no CDS file is provided or it's empty.

Thu Oct 10 16:37:06 CST 2024    EDTA final stage finished! You may check out:
                The final EDTA TE library: new.part_002.fasta.mod.EDTA.TElib.fa
The end time is: 2024-10-10 16:37:06

Warning: No sequences were masked
  1. Issue with SINE detection: For other chromosome parts, while LTR elements were detected, no SINE elements were found during the annotation process. Is there something that could be affecting SINE detection across these chromosomes?
The start time is: 2024-09-25 16:01:12 
My job ID is: 14944128 
The total cores is: 32 
The hosts is: 
g06r4n15:32

#########################################################
##### Extensive de-novo TE Annotator (EDTA) v2.2.1  #####
##### Shujun Ou (shujun.ou.1@gmail.com)             #####
#########################################################

Parameters: --genome new.part_006.fasta --species others --step all --overwrite 0 --sensitive 0 --anno 0 --evaluate 0 --u 1.3e-8 --threads 32 --force 1

Wed Sep 25 16:01:14 CST 2024    Dependency checking:
                All passed!

Wed Sep 25 16:02:34 CST 2024    The longest sequence ID in the genome contains 61 characters, which is longer than the limit (13)
                Trying to reformat seq IDs...
                Attempt 1...
Wed Sep 25 16:03:01 CST 2024    Seq ID conversion successful!

Wed Sep 25 16:03:01 CST 2024    Obtain raw TE libraries using various structure-based programs: 

Wed Sep 25 16:03:01 CST 2024    EDTA_raw: Check dependencies, prepare working directories.

Wed Sep 25 16:03:22 CST 2024    Start to find LTR candidates.

Wed Sep 25 16:03:22 CST 2024    Identify LTR retrotransposon candidates from scratch.

Thu Sep 26 09:26:36 CST 2024    Finish finding LTR candidates.

Thu Sep 26 09:26:36 CST 2024    Start to find SINE candidates.

cp: cannot stat 'new.part_006.fasta.mod.SINE.raw.fa': No such file or directory
Error: SINE results not found!

cat: new.part_006.fasta.mod.TIR.intact.raw.bed: No such file or directory
cat: new.part_006.fasta.mod.Helitron.intact.raw.bed: No such file or directory
cp: cannot stat '../new.part_006.fasta.mod.EDTA.raw/new.part_006.fasta.mod.RM2.fa': No such file or directory

Thu Sep 26 09:26:37 CST 2024    Obtain raw TE libraries finished.
                All intact TEs found by EDTA: 
                    new.part_006.fasta.mod.EDTA.intact.raw.fa 
                    new.part_006.fasta.mod.EDTA.intact.raw.gff3

Thu Sep 26 09:26:37 CST 2024    Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library: 

Thu Sep 26 09:34:02 CST 2024    EDTA advance filtering finished.

Thu Sep 26 09:34:02 CST 2024    Perform EDTA final steps to generate a non-redundant comprehensive TE library.

                Skipping the RepeatModeler results (--sensitive 0).
                Run EDTA.pl --step final --sensitive 1 if you want to add RepeatModeler results.

                Skipping the CDS cleaning step (--cds [File]) since no CDS file is provided or it's empty.

Thu Sep 26 10:28:10 CST 2024    EDTA final stage finished! You may check out:
                The final EDTA TE library: new.part_006.fasta.mod.EDTA.TElib.fa
The end time is: 2024-09-26 10:28:10

If you need further information or logs, I’d be happy to provide them. I appreciate your time and help with these issues.

Thank you again for your continued support and for developing such a valuable tool!

Best regards, rr

oushujun commented 5 days ago

Looks like you have an error:

Out of memory!

You may also want to rename sequence IDs before running EDTA.

Shujun

On Fri, Oct 11, 2024 at 3:47 AM rr9002 @.***> wrote:

First of all, thank you for providing such an excellent tool for TE annotation. I’m currently using EDTA v2.2.1 to annotate transposable elements for a large genome, GCA_014155895.2 (~16G). Due to its size, I’ve split the genome by chromosomes into 9 parts, each processed separately with EDTA. Below is the command I’m using for each split:

perl ../EDTA/EDTA.pl --genome new.part_002.fasta --species others --step all --overwrite 0 --sensitive 0 --anno 0 --evaluate 0 --u 1.3e-8 --threads 32 --force 1

I encountered the following issues:

  1. Issue with LTR detection: When running EDTA on chromosome 2 (new.part_002.fasta, 2.0G), I received an error, and no LTR elements were detected. Could you please advise on why this may be happening for this specific chromosome?

The start time is: 2024-10-09 21:29:56 My job ID is: 15283037 The total cores is: 64 The hosts is: i05r3n18:64

#########################################################

Extensive de-novo TE Annotator (EDTA) v2.2.1
Shujun Ou @.***)

#########################################################

Parameters: --genome new.part_002.fasta --species others --step all --overwrite 0 --sensitive 0 --anno 0 --evaluate 0 --u 1.3e-8 --threads 32 --force 1

Wed Oct 9 21:30:14 CST 2024 Dependency checking: All passed!

Wed Oct 9 21:31:50 CST 2024 The longest sequence ID in the genome contains 87 characters, which is longer than the limit (13) Trying to reformat seq IDs... Attempt 1... Wed Oct 9 21:32:24 CST 2024 Seq ID conversion successful!

Wed Oct 9 21:32:24 CST 2024 Obtain raw TE libraries using various structure-based programs:

Wed Oct 9 21:32:24 CST 2024 EDTA_raw: Check dependencies, prepare working directories.

Wed Oct 9 21:32:49 CST 2024 Start to find LTR candidates.

Wed Oct 9 21:32:49 CST 2024 Identify LTR retrotransposon candidates from scratch.

Out of memory! Out of memory! cat: new.part_002.fasta.mod.harvest.combine.scn: No such file or directory cat: new.part_002.fasta.mod.finder.combine.scn: No such file or directory grep: new.part_002.fasta.mod.retriever.scn: No such file or directory Argument "" isn't numeric in numeric gt (>) at /work/home/acbirxa1yd/miniconda3/envs/EDTA2/share/LTR_retriever/LTR_retriever line 380.

ERROR: No candidate is found in the file(s) you specified.

awk: fatal: cannot open file `new.part_002.fasta.mod.pass.list' for reading: No such file or directory Warning: LOC list - is empty.

perl rename_LTR_skim.pl target_sequence.fa LTR_retriever.defalse

Error: Error while loading sequence Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file. Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv Author: Shujun Ou @.***) 10/11/2019

mv: cannot stat 'new.part_002.fasta.mod.LTR.intact.fa.ori.dusted.cln.cln': No such file or directory mv: cannot stat 'new.part_002.fasta.mod.LTR.intact.fa.ori.dusted.cln.cln.list': No such file or directory cp: cannot stat 'new.part_002.fasta.mod.LTR.intact.raw.fa.anno.list': No such file or directory ERROR: No such file or directory at /work/home/acbirxa1yd/renhongbin/EDTA/util/output_by_list.pl line 39.

perl filter_gff3.pl file.gff3 file.list > new.gff3

Wed Oct 9 21:35:32 CST 2024 Warning: The LTR result file has 0 bp!

Wed Oct 9 21:35:32 CST 2024 Start to find SINE candidates.

Thu Oct 10 03:26:20 CST 2024 Finish finding SINE candidates.

Thu Oct 10 03:26:20 CST 2024 Start to find LINE candidates.

Thu Oct 10 03:26:20 CST 2024 Existing result file new.part_002.fasta.mod-families.fa found! Will keep this file without rerunning this module. Please specify --overwrite 1 if you want to rerun this module.

Thu Oct 10 03:26:30 CST 2024 Finish finding LINE candidates.

Thu Oct 10 03:26:30 CST 2024 Start to find TIR candidates.

Thu Oct 10 03:26:30 CST 2024 Identify TIR candidates from scratch.

Species: others Thu Oct 10 16:34:43 CST 2024 Finish finding TIR candidates.

Thu Oct 10 16:34:43 CST 2024 Start to find Helitron candidates.

Thu Oct 10 16:34:43 CST 2024 Existing result file new.part_002.fasta.mod.Helitron.intact.raw.fa found! Will keep this file without rerunning this module. Please specify --overwrite 1 if you want to rerun this module.

Thu Oct 10 16:34:43 CST 2024 Finish finding Helitron candidates.

Thu Oct 10 16:34:43 CST 2024 Execution of EDTA_raw.pl is finished!

Thu Oct 10 16:34:43 CST 2024 Obtain raw TE libraries finished. All intact TEs found by EDTA: new.part_002.fasta.mod.EDTA.intact.raw.fa new.part_002.fasta.mod.EDTA.intact.raw.gff3

Thu Oct 10 16:34:43 CST 2024 Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library:

Thu Oct 10 16:35:50 CST 2024 EDTA advance filtering finished.

Thu Oct 10 16:35:50 CST 2024 Perform EDTA final steps to generate a non-redundant comprehensive TE library.

          Skipping the RepeatModeler results (--sensitive 0).
          Run EDTA.pl --step final --sensitive 1 if you want to add RepeatModeler results.

          Skipping the CDS cleaning step (--cds [File]) since no CDS file is provided or it's empty.

Thu Oct 10 16:37:06 CST 2024 EDTA final stage finished! You may check out: The final EDTA TE library: new.part_002.fasta.mod.EDTA.TElib.fa The end time is: 2024-10-10 16:37:06

Warning: No sequences were masked

  1. Issue with SINE detection: For other chromosome parts, while LTR elements were detected, no SINE elements were found during the annotation process. Is there something that could be affecting SINE detection across these chromosomes?

The start time is: 2024-09-25 16:01:12 My job ID is: 14944128 The total cores is: 32 The hosts is: g06r4n15:32

#########################################################

Extensive de-novo TE Annotator (EDTA) v2.2.1
Shujun Ou @.***)

#########################################################

Parameters: --genome new.part_006.fasta --species others --step all --overwrite 0 --sensitive 0 --anno 0 --evaluate 0 --u 1.3e-8 --threads 32 --force 1

Wed Sep 25 16:01:14 CST 2024 Dependency checking: All passed!

Wed Sep 25 16:02:34 CST 2024 The longest sequence ID in the genome contains 61 characters, which is longer than the limit (13) Trying to reformat seq IDs... Attempt 1... Wed Sep 25 16:03:01 CST 2024 Seq ID conversion successful!

Wed Sep 25 16:03:01 CST 2024 Obtain raw TE libraries using various structure-based programs:

Wed Sep 25 16:03:01 CST 2024 EDTA_raw: Check dependencies, prepare working directories.

Wed Sep 25 16:03:22 CST 2024 Start to find LTR candidates.

Wed Sep 25 16:03:22 CST 2024 Identify LTR retrotransposon candidates from scratch.

Thu Sep 26 09:26:36 CST 2024 Finish finding LTR candidates.

Thu Sep 26 09:26:36 CST 2024 Start to find SINE candidates.

cp: cannot stat 'new.part_006.fasta.mod.SINE.raw.fa': No such file or directory Error: SINE results not found!

cat: new.part_006.fasta.mod.TIR.intact.raw.bed: No such file or directory cat: new.part_006.fasta.mod.Helitron.intact.raw.bed: No such file or directory cp: cannot stat '../new.part_006.fasta.mod.EDTA.raw/new.part_006.fasta.mod.RM2.fa': No such file or directory

Thu Sep 26 09:26:37 CST 2024 Obtain raw TE libraries finished. All intact TEs found by EDTA: new.part_006.fasta.mod.EDTA.intact.raw.fa new.part_006.fasta.mod.EDTA.intact.raw.gff3

Thu Sep 26 09:26:37 CST 2024 Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library:

Thu Sep 26 09:34:02 CST 2024 EDTA advance filtering finished.

Thu Sep 26 09:34:02 CST 2024 Perform EDTA final steps to generate a non-redundant comprehensive TE library.

          Skipping the RepeatModeler results (--sensitive 0).
          Run EDTA.pl --step final --sensitive 1 if you want to add RepeatModeler results.

          Skipping the CDS cleaning step (--cds [File]) since no CDS file is provided or it's empty.

Thu Sep 26 10:28:10 CST 2024 EDTA final stage finished! You may check out: The final EDTA TE library: new.part_006.fasta.mod.EDTA.TElib.fa The end time is: 2024-09-26 10:28:10

If you need further information or logs, I’d be happy to provide them. I appreciate your time and help with these issues.

Thank you again for your continued support and for developing such a valuable tool!

Best regards, rr

— Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/510, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NDNRKAKXPLHO3RASOLZ2567XAVCNFSM6AAAAABPYLNIP2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4DANRYG4YDQNQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>