oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
336 stars 73 forks source link

Empty sum file, gff2bed errors #382

Closed SwiftSeal closed 10 months ago

SwiftSeal commented 1 year ago

Hi Shujun,

I am experiencing some issues running the latest github clone of the pipeline. I am running it with the provided conda environment - the annotation files are populated but I am getting an empty .sum file as some other users have mentioned. The log is massive, but here is the top and bottom of it:

Wed  6 Sep 11:11:10 BST 2023    EDTA_raw: Check dependencies, prepare working directories.

Wed  6 Sep 11:11:15 BST 2023    Start to find LTR candidates.

Wed  6 Sep 11:11:15 BST 2023    Identify LTR retrotransposon candidates from scratch.

Wed  6 Sep 13:17:39 BST 2023    Finish finding LTR candidates.

Wed  6 Sep 13:17:39 BST 2023    Start to find TIR candidates.

Wed  6 Sep 13:17:39 BST 2023    Identify TIR candidates from scratch.

Species: others
Wed  6 Sep 18:32:31 BST 2023    Finish finding TIR candidates.

Wed  6 Sep 18:32:31 BST 2023    Start to find Helitron candidates.

Wed  6 Sep 18:32:31 BST 2023    Identify Helitron candidates from scratch.

Thu  7 Sep 00:51:48 BST 2023    Finish finding Helitron candidates.

Thu  7 Sep 00:51:48 BST 2023    Execution of EDTA_raw.pl is finished!

Thu  7 Sep 02:23:01 BST 2023    Homology-based annotation of TEs using solanum_verrucosum.fa.mod.EDTA.TElib.fa from scratch.

Use of uninitialized value $TE_class in pattern match (m//) at /mnt/shared/scratch/msmith/solanum_verrucosum/EDTA/util/gff2bed.pl line 106, <GFF> line 5.
Use of uninitialized value $TE_class in concatenation (.) or string at /mnt/shared/scratch/msmith/solanum_verrucosum/EDTA/util/gff2bed.pl line 110, <GFF> line 5.
Use of uninitialized value $method in concatenation (.) or string at /mnt/shared/scratch/msmith/solanum_verrucosum/EDTA/util/gff2bed.pl line 110, <GFF> line 5.
Use of uninitialized value $TE_class in concatenation (.) or string at /mnt/shared/scratch/msmith/solanum_verrucosum/EDTA/util/gff2bed.pl line 110, <GFF> line 7.
Use of uninitialized value $method in concatenation (.) or string at /mnt/shared/scratch/msmith/solanum_verrucosum/EDTA/util/gff2bed.pl line 110, <GFF> line 7.
Use of uninitialized value $method in pattern match (m//) at /mnt/shared/scratch/msmith/solanum_verrucosum/EDTA/util/gff2bed.pl line 107, <GFF> line 8.
Use of uninitialized value $TE_class in concatenation (.) or string at /mnt/shared/scratch/msmith/solanum_verrucosum/EDTA/util/gff2bed.pl line 110, <GFF> line 8.
...lines removed...
Warning: NA not found in the TE_SO database, will use the general term 'repeat_region   SO:0000657' to replace it.

Warning: 0.9901 not found in the TE_SO database, will use the general term 'repeat_region       SO:0000657' to replace it.

Warning: NA not found in the TE_SO database, will use the general term 'repeat_region   SO:0000657' to replace it.

Warning: NA not found in the TE_SO database, will use the general term 'repeat_region   SO:0000657' to replace it.

Warning: 0.9834 not found in the TE_SO database, will use the general term 'repeat_region       SO:0000657' to replace it.

Warning: NA not found in the TE_SO database, will use the general term 'repeat_region   SO:0000657' to replace it.

Warning: 0.9843 not found in the TE_SO database, will use the general term 'repeat_region       SO:0000657' to replace it.

Warning: NA not found in the TE_SO database, will use the general term 'repeat_region   SO:0000657' to replace it.

Warning: NA not found in the TE_SO database, will use the general term 'repeat_region   SO:0000657' to replace it.

Warning: 0.9887 not found in the TE_SO database, will use the general term 'repeat_region       SO:0000657' to replace it.

Warning: NA not found in the TE_SO database, will use the general term 'repeat_region   SO:0000657' to replace it.

I tried fixing this with the patch you mentioned (https://github.com/oushujun/EDTA/issues/372) but no luck - it still had the same issues.

I also tried switching to the latest singularity release for this, but that also failed:

Wed Sep  6 09:15:58 BST 2023    EDTA_raw: Check dependencies, prepare working directories.

Wed Sep  6 09:16:03 BST 2023    Start to find LTR candidates.

Wed Sep  6 09:16:03 BST 2023    Identify LTR retrotransposon candidates from scratch.

Wed Sep  6 10:53:13 BST 2023    Finish finding LTR candidates.

Wed Sep  6 10:53:13 BST 2023    Start to find TIR candidates.

Wed Sep  6 10:53:13 BST 2023    Identify TIR candidates from scratch.

Species: others
Illegal option --
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /opt/conda/share/EDTA/util/rename_tirlearner.pl line 19.
Warning: LOC list solanum_verrucosum.fa.mod.TIR.ext30.list is empty.

Error: Error while loading sequenceCan't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory.
Warning: The TIR result file has 0 bp!

Wed Sep  6 10:53:28 BST 2023    Start to find Helitron candidates.

Wed Sep  6 10:53:28 BST 2023    Identify Helitron candidates from scratch.

Any idea on what could be causing this? I also tried the conda distribution but it had the same uninitialized value errors as the github version.

I am executing EDTA as part of a snakemake pipeline with the shell:

        mkdir -p results/edta
        cp {input.genome} results/edta/
        cd results/edta
        ../../EDTA/EDTA.pl --genome genome.fa --threads {threads} --anno 1 --overwrite 1

With EDTA as a git submodule.

Thanks in advance! Moray

SwiftSeal commented 1 year ago

Apologies - those error messages are with the patches mentioned in https://github.com/oushujun/EDTA/issues/372. Looks like one of the problematic gff lines is line 5?:

chr01   EDTA    repeat_region   345398  354039  .       ?       .       ID=repeat_region1
oushujun commented 1 year ago

Hi Moray,

Something is wrong with your execution. You need to check solanum_verrucosum.fa.mod.EDTA.TElib.fa and see if there are any abnormalities. I don't understand why you can jump from the raw step directly to the annotation step. Maybe just some copy-paste mistakes. The singularity and docker version is not updated. The conda installation + the github version is your best bet. Let me know if you find our more.

Thanks, Shujun

SwiftSeal commented 10 months ago

Hi Shujun,

I've revisted this - I'm not sure exactly why, but it seems that the issues were being caused by parallel EDTA runs being executed in the same directory. Everything is running correctly now that they are running in separate directories! This was with the conda env + github repo. Closing now, thanks for your suggestions :)

oushujun commented 9 months ago

Did you run multiple EDTA instances on the same genome or different genomes? Latter should be fine.