oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
315 stars 70 forks source link

Testing output #429

Open G-Thomson opened 4 months ago

G-Thomson commented 4 months ago

Hi

This looks like a very useful tool.

What is the expected output from the testing data?

I have installed EDTA v2.2.0 via conda since when I try to use the yml file I get errors to do with CUDA when I try to install tensorflow

​conda create -n edta python=3.10.12 edta=2.2.0 numpy pandas matplotlib jupyter cudatoolkit=11.8.0

When I run the test data I get an error for LTR.identifier.pl and SINE's and LINES's have 0bp. Is this expected?

Alternatively would it be possible to make a docker/singularity container for EDTA v2.2.0? as the most recent is EDTA v2.0.0

Parameters: --genome genome.fa --cds genome.cds.fa --curatedlib ../database/rice7.0.0.liban --exclude genome.exclude.bed --overwrite 1 --sensitive 1 --anno 1 --threads 10

Thu Feb  8 10:29:29 EST 2024    Dependency checking:

                All passed!

    A custom library ../database/rice7.0.0.liban is provided via --curatedlib. Please make sure this is a manually curated library but not machine generated.

    A CDS file genome.cds.fa is provided via --cds. Please make sure this is the DNA sequence of coding regions only.

    A BED file is provided via --exclude. Regions specified by this file will be excluded from TE annotation and masking.

Thu Feb  8 10:29:33 EST 2024    Obtain raw TE libraries using various structure-based programs: 
Thu Feb  8 10:29:33 EST 2024    EDTA_raw: Check dependencies, prepare working directories.

Thu Feb  8 10:29:34 EST 2024    Start to find LTR candidates.

Thu Feb  8 10:29:34 EST 2024    Identify LTR retrotransposon candidates from scratch.

Invalid value for shared scalar at /home/got3/.conda/envs/edta2_2attempt2/share/LTR_retriever/bin/LTR.identifier.pl line 114, <ANNO> line 11.
cp: cannot stat 'genome.fa.mod.retriever.scn.adj': No such file or directory
awk: fatal: cannot open file `genome.fa.mod.pass.list' for reading: No such file or directory
Warning: LOC list - is empty.

Error: Error while loading sequence
Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file.
    Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv
    Author: Shujun Ou (shujun.ou.1@gmail.com) 10/11/2019

mv: cannot stat 'genome.fa.mod.LTR.intact.fa.ori.dusted.cln.cln': No such file or directory
mv: cannot stat 'genome.fa.mod.LTR.intact.fa.ori.dusted.cln.cln.list': No such file or directory
cp: cannot stat 'genome.fa.mod.LTR.intact.raw.fa.anno.list': No such file or directory
ERROR: No such file or directory at /vast/palmer/scratch/jacob/got3/EDTA/util/output_by_list.pl line 39.

    perl filter_gff3.pl file.gff3 file.list > new.gff3

Thu Feb  8 10:29:49 EST 2024    Warning: The LTR result file has 0 bp!

Thu Feb  8 10:29:49 EST 2024    Start to find SINE candidates.

Thu Feb  8 10:31:15 EST 2024    Warning: The SINE result file has 0 bp!

Thu Feb  8 10:31:15 EST 2024    Start to find LINE candidates.

Thu Feb  8 10:31:15 EST 2024    Identify LINE retrotransposon candidates from scratch.

Thu Feb  8 10:33:16 EST 2024    Warning: The LINE result file has 0 bp!

Thu Feb  8 10:33:16 EST 2024    Start to find TIR candidates.

Thu Feb  8 10:33:16 EST 2024    Identify TIR candidates from scratch.

Species: others
Thu Feb  8 10:35:19 EST 2024    Finish finding TIR candidates.

Thu Feb  8 10:35:19 EST 2024    Start to find Helitron candidates.

Thu Feb  8 10:35:19 EST 2024    Identify Helitron candidates from scratch.

Thu Feb  8 10:35:52 EST 2024    Finish finding Helitron candidates.

Thu Feb  8 10:35:52 EST 2024    Execution of EDTA_raw.pl is finished!

ERROR: Raw LTR results not found in genome.fa.mod.EDTA.raw/genome.fa.mod.LTR.raw.fa and genome.fa.mod.EDTA.raw/genome.fa.mod.LTR.intact.raw.fa
    If you believe the program is working properly, this may be caused by the lack of intact LTRs in your genome. Consider to use the --force 1 parameter to overwrite this check

oushujun commented 4 months ago

Hello,

Sorry for the delay response. Please update your repo and try with the updated yml file. It should be able to install without the CUDA dependency.

Best, Shujun

rpetroll commented 4 months ago

Hi Shujun,

I am also getting the same error as described above. I also tried using the updated yml file, but still getting the same error and 0 Bp files for LINEs and SINEs. I also tried the container 2.2.0--hdfd78af_1, also here it is the same error.

I am very thankful for any advice. Thank you! Romy

oushujun commented 3 months ago

You can use the following mamba command to install EDTA: mamba create -n EDTA2.2 -c conda-forge -c bioconda -c r annosine2 biopython cd-hit coreutils genericrepeatfinder genometools-genometools glob2 h5py==3.9 keras==2.11 ltr_finder ltr_retriever mdust multiprocess muscle openjdk pandas perl perl-text-soundex pyarrow python r-base r-dplyr regex repeatmodeler r-ggplot2 r-here r-tidyr scikit-learn swifter tensorflow-cpu==2.11 tesorter samtools bedtools grep

The bioconda and docker/singularity version is not yet updated. As EDTA2 is actively being developed, it's good to pull the github every while to get the latest version.

Thanks! Shujun

Mosy135 commented 1 month ago

Encountered same error, mamba install version failed in testing. Cloned git repository together with this env works fine mamba create -n EDTA2.2 -c conda-forge -c bioconda -c r annosine2 biopython cd-hit coreutils genericrepeatfinder genometools-genometools glob2 h5py==3.9 keras==2.11 ltr_finder ltr_retriever mdust multiprocess muscle openjdk pandas perl perl-text-soundex pyarrow python r-base r-dplyr regex repeatmodeler r-ggplot2 r-here r-tidyr scikit-learn swifter tensorflow-cpu==2.11 tesorter samtools bedtools grep