nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

CMD Error in funannotatate train at PASA step with specific samples. #999

Open LucasvdGouw opened 4 months ago

LucasvdGouw commented 4 months ago

Are you using the latest release? Yes, version 1.8.16 as installed per https://funannotate.readthedocs.io/en/latest/install.html via conda.

Describe the bug Funannotate train generates a CMD error at the PASA step. Thing is, this only happened to 2 out of 50+ samples. The others go though this process without issue.

What command did you issue? We are using funannotate in a snakemake pipeline. These are the commands the snakemake gave.

funannotate train -i annotation/output/Nspnov4/39038786/01.funannotate/fun_39038786.cleaned.sorted.masked.fa -o annotation/output/Nspnov4/39038786/01.funannotate/trained_39038786_and_SpNov4_out --left annotation/output/Nspnov4/39038786/02.fastp_RNA/RNA_SpNov4_trim_R1.fq.gz --right annotation/output/Nspnov4/39038786/02.fastp_RNA/RNA_SpNov4_trim_R2.fq.gz --stranded RF --jaccard_clip --cpus 16

funannotate train -i annotation/output/Nspnov4/39038794/01.funannotate/fun_39038794.cleaned.sorted.masked.fa -o annotation/output/Nspnov4/39038794/01.funannotate/trained_39038794_and_SpNov4_out --left annotation/output/Nspnov4/39038794/02.fastp_RNA/RNA_SpNov4_trim_R1.fq.gz --right annotation/output/Nspnov4/39038794/02.fastp_RNA/RNA_SpNov4_trim_R2.fq.gz --stranded RF --jaccard_clip --cpus 16

Logfiles

Both files generate a similar error:

Activating conda environment: .snakemake/conda/6b59cb2bdbc485a8475ebcfb0cdac154_

[Jan 16 09:06 AM]: OS: Ubuntu 22.04, 16 cores, ~ 132 GB RAM. Python: 3.8.15 [Jan 16 09:06 AM]: Running 1.8.15 [Jan 16 09:06 AM]: Adapter and Quality trimming PE reads with Trimmomatic [Jan 16 09:09 AM]: Running read normalization with Trinity [Jan 16 09:23 AM]: Building Hisat2 genome index [Jan 16 09:23 AM]: Aligning reads to genome using Hisat2 [Jan 16 09:24 AM]: Running genome-guided Trinity, logfile: annotation/output/Nspnov4/39038794/01.funannotate/trained_39038794_and_SpNov4out/training/Trinity-gg.log [Jan 16 09:24 AM]: Clustering of reads from BAM and preparing assembly commands [Jan 16 09:29 AM]: Assembling 12,853 Trinity clusters using 15 CPUs Progress: 12853 complete, 0 failed, 0 remaining
[Jan 16 10:37 AM]: 21,459 transcripts derived from Trinity [Jan 16 10:37 AM]: Running StringTie on Hisat2 coordsorted BAM [Jan 16 10:37 AM]: Removing poly-A sequences from trinity transcripts using seqclean [Jan 16 10:37 AM]: Converting transcript alignments to GFF3 format [Jan 16 10:37 AM]: Converting Trinity transcript alignments to GFF3 format [Jan 16 10:37 AM]: Running PASA alignment step using 21,459 transcripts [Jan 16 02:46 PM]: CMD ERROR: /data/2022.molbio.009-main/.snakemake/conda/6b59cb2bdbc485a8475ebcfb0cdac154
/opt/pasa-2.5.2/Launch_PASA_pipeline.pl -c /data/2022.molbio.009-main/annotation/output/Nspnov4/39038794/01.funannotate/trained_39038794_and_SpNov4_out/training/pasa/alignAssembly.txt -r -C -R -g /data/2022.molbio.009-main/annotation/output/Nspnov4/39038794/01.funannotate/trained_39038794_and_SpNov4_out/training/genome.fasta --IMPORT_CUSTOM_ALIGNMENTS /data/2022.molbio.009-main/annotation/output/Nspnov4/39038794/01.funannotate/trained_39038794_and_SpNov4_out/training/trinity.alignments.gff3 -T -t /data/2022.molbio.009-main/annotation/output/Nspnov4/39038794/01.funannotate/trained_39038794_and_SpNov4_out/training/trinity.fasta.clean -u /data/2022.molbio.009-main/annotation/output/Nspnov4/39038794/01.funannotate/trained_39038794_and_SpNov4_out/training/trinity.fasta --stringent_alignment_overlap 30.0 --TRANSDECODER --ALT_SPLICE --MAX_INTRON_LENGTH 3000 --CPU 16 --ALIGNERS blat --transcribed_is_aligned_orient --trans_gtf /data/2022.molbio.009-main/annotation/output/Nspnov4/39038794/01.funannotate/trained_39038794_and_SpNov4_out/training/funannotate_train.stringtie.gtf

OS/Install Information /data/2022.molbio.009-main/.snakemake/conda/6b59cb2bdbc485a8475ebcfb0cdac154/bin/python: not found (which python returns /data/2022.molbio.009-main/.snakemake/conda/6b59cb2bdbc485a8475ebcfb0cdac154/bin/python)

bpeacock44 commented 3 months ago

I am also having this issue - 2 out of my 11 genomes get "CMD ERROR" at that step. Not sure how to troubleshoot. The other 9 ran perfectly.

bpeacock44 commented 3 months ago

Just as an update, I was able to find more detailed errors in the file training/pasa/pasa-assembly.log for each failed directory. This helped me to troubleshoot and all ran successfully!