nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
302 stars 82 forks source link

funannotate update fails with a previously PASA trained dataset (but worked before) #260

Closed estolle closed 5 years ago

estolle commented 5 years ago

Hi there

I managed to use funannotate train and predict, and now wanted to run funannotate update, but it fails with this error message:

Traceback (most recent call last): File "/opt/funnotate/funannotate-1.5.1/bin/funannotate-update.py", line 1866, in if lib.which('stringtie') and lib.checkannotations(shortBAM): NameError: name 'shortBAM' is not defined

Do you know where this might come from? A different run (where I did not use funannotate train, but the data are the same), worked used the very same funannotate update command. In the logfile I don't find anything useful which could point me into the right direction to fix this error. ANy ideas?

my command: ( funannotate update -i funannotate.train1 --cpus 100 \ --left /scratch/ek/reads/euglossa/euglossa.R1.fastq.gz \ --right /scratch/ek/reads/euglossa/euglossa.R2.fastq.gz \ --memory 200G --stranded no --pasa_db mysql \ --species "Euglossa viridissima2" --max_intronlen 5000 --pasa_alignment_overlap 30.0 --coverage 30 \ --trinity /scratch/ek/euglossa/euglossa.annotation.phil.brand/dil_vir_merged_transcriptome.fa )

[12:19 AM]: OS: linux2, 112 cores, ~ 528 GB RAM. Python: 2.7.12 [12:19 AM]: Running funannotate v1.5.0 [12:19 AM]: No NCBI SBT file given, will use default, for NCBI submissions pass one here '--sbt' [12:19 AM]: Found relevent files in funannotate.train1/training, will re-use them: Forward reads: funannotate.train1/training/left.fq.gz Reverse reads: funannotate.train1/training/right.fq.gz Trinity results: funannotate.train1/training/funannotate_train.trinity-GG.fasta PASA config file: funannotate.train1/training/pasa/alignAssembly.txt [12:20 AM]: Reannotating Euglossa viridissima2, NCBI accession: None [12:20 AM]: Previous annotation consists of: 35,607 protein coding gene models and 128 non-coding gene models [12:20 AM]: Trimmomatic will be skipped [12:20 AM]: Read normalization will be skipped Traceback (most recent call last): File "/opt/funnotate/funannotate-1.5.1/bin/funannotate-update.py", line 1866, in if lib.which('stringtie') and lib.checkannotations(shortBAM): NameError: name 'shortBAM' is not defined

nextgenusfs commented 5 years ago

Looks like that BAM alignment file wasn't setup to find from training dataset -- not sure how this slipped through.... Can you try latest commit and see if that fixes it? https://github.com/nextgenusfs/funannotate/commit/b854ac200003aef8bce74979c76355a6540fd9ed

estolle commented 5 years ago

Awesome! The issue is now fixed. Thanks so much!

... its still running but the terminal output is this now (it proceeded where before it got stuck):

[11:27 AM]: OS: linux2, 112 cores, ~ 528 GB RAM. Python: 2.7.12 [11:27 AM]: Running funannotate v1.5.0 [11:27 AM]: No NCBI SBT file given, will use default, for NCBI submissions pass one here '--sbt' [11:27 AM]: Found relevent files in funannotate.train1/training, will re-use them: Forward reads: funannotate.train1/training/left.fq.gz Reverse reads: funannotate.train1/training/right.fq.gz Trinity results: funannotate.train1/training/funannotate_train.trinity-GG.fasta PASA config file: funannotate.train1/training/pasa/alignAssembly.txt BAM alignments: funannotate.train1/training/funannotate_train.coordSorted.bam [11:28 AM]: Reannotating Euglossa viridissima2, NCBI accession: None [11:28 AM]: Previous annotation consists of: 35,607 protein coding gene models and 128 non-coding gene models [11:28 AM]: Trimmomatic will be skipped [11:28 AM]: Read normalization will be skipped [11:28 AM]: StringTie installed, running StringTie on Hisat2 coordsorted BAM [11:29 AM]: Converting transcript alignments to GFF3 format [11:29 AM]: Converting Trinity transcript alignments to GFF3 format [11:30 AM]: Existing PASA database contains 85,141 gene models, validated FASTA headers match [11:30 AM]: Running PASA annotation comparison step 1