nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
320 stars 85 forks source link

funannotate annotate ValueError: invalid literal for int() with base 10: 'ver' #943

Closed lichen-fungus closed 1 year ago

lichen-fungus commented 1 year ago

Are you using the latest release? yes

funannotate -version 
funannotate v1.8.15

Describe the bug When running funannotate annotate with external phobius, signalp, eggnog-emapper, antismash and InterproScan results, the program crashes after reading in phobius/signalp6 results. There is no logfile funanotate-annotate.log.

It doesn't change anything if I drop --no-progress or --isolate "A1" or 2>&1 | tee -a logfiles/won_spe_logfile

I have annotated numerous other genomes with the same code except for a different input file previously. If you look at my installation, you will see that gmap is not installed. I have not been able to install it into the current anaconda environment despite numerous attempts; nevertheless, funannotate ran just fine until today, generating meaningful outputs.

What command did you issue? funannotate annotate -i ./won_spe_pred/ --cpus 10 -d /data/scratch/memyself/software/funannotate/analysis/funannotate_db/ --no-progress --busco_db /data/scratch/memyself/software/funannotate/analysis/funannotate_db/ascomycota --phobius /data/scratch/memyself/software/funannotate/analysis/won_spe_pred/annotate_misc/Wonderful_species_phobius.results.txt --antismash /data/scratch/memyself/software/funannotate/analysis/won_spe_pred/annotate_misc/Wonderful_species_antismash.gbk --iprscan /data/scratch/memyself/software/funannotate/analysis/won_spe_pred/annotate_misc/Wonderful_species_iprscan.xml -s "Wonderful species" --signalp /data/scratch/memyself/software/funannotate/analysis/won_spe_pred/annotate_misc/Wonderful_species_signalp6_prediction_results.txt --eggnog /data/scratch/memyself/software/funannotate/analysis/won_spe_pred/annotate_misc/Wonderful_species_eggnog.emapper.annotations --isolate "A1" 2>&1 | tee -a logfiles/won_spe_logfile

Logfiles

[Aug 01 02:54 PM]: OS: Ubuntu 18.04, 80 cores, ~ 791 GB RAM. Python: 3.8.13
[Aug 01 02:54 PM]: Running 1.8.15
[Aug 01 02:54 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt'
[Aug 01 02:54 PM]: Found existing output directory ./won_spe_pred. Warning, will re-use any intermediate files found.
[Aug 01 02:54 PM]: Parsing input files
[Aug 01 02:54 PM]: Existing tbl found: ./won_spe_pred/predict_results/Wonderful_species.tbl
[Aug 01 02:54 PM]: Adding Functional Annotation to Wonderful species, NCBI accession: None
[Aug 01 02:54 PM]: Annotation consists of: 9,045 gene models
[Aug 01 02:54 PM]: 8,951 protein records loaded
[Aug 01 02:54 PM]: Running HMMer search of PFAM version 35.0
[Aug 01 03:01 PM]: 10,155 annotations added
[Aug 01 03:01 PM]: Running Diamond blastp search of UniProt DB version 2023_02
[Aug 01 03:01 PM]: 539 valid gene/product annotations from 817 total
[Aug 01 03:01 PM]: Existing Eggnog-mapper results found: ./won_spe_pred/annotate_misc/eggnog.emapper.annotations
[Aug 01 03:01 PM]: Parsing EggNog Annotations
[Aug 01 03:01 PM]: EggNog version parsed as 2.1.10
[Aug 01 03:01 PM]: 16,567  COG and EggNog annotations added
[Aug 01 03:01 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.88
[Aug 01 03:01 PM]: 2,476 gene name and product description annotations added
[Aug 01 03:01 PM]: Running Diamond blastp search of MEROPS version 12.0
[Aug 01 03:02 PM]: 346 annotations added
[Aug 01 03:02 PM]: Annotating CAZYmes using HMMer search of dbCAN version 11.0
[Aug 01 03:02 PM]: 356 annotations added
[Aug 01 03:02 PM]: Annotating proteins with BUSCO /data/scratch/memyself/software/funannotate/analysis/funannotate_db/ascomycota models
[Aug 01 03:03 PM]: 1,212 annotations added
[Aug 01 03:03 PM]: Existing Phobius results found: ./won_spe_pred/annotate_misc/phobius.results.txt
[Aug 01 03:03 PM]: Existing SignalP results found: ./won_spe_pred/annotate_misc/signalp.results.txt
-------------------------------------------------------
Traceback (most recent call last):
  File "/home/admin/anaconda3/envs/funannotate_1.8/bin/funannotate", line 8, in <module>
    sys.exit(main())
  File "/home/admin/anaconda3/envs/funannotate_1.8/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main
    mod.main(arguments)
  File "/home/admin/anaconda3/envs/funannotate_1.8/lib/python3.8/site-packages/funannotate/annotate.py", line 1337, in main
    lib.parsePhobiusSignalP(
  File "/home/admin/anaconda3/envs/funannotate_1.8/lib/python3.8/site-packages/funannotate/library.py", line 7340, in parsePhobiusSignalP
    if int(cols[1]) > 0:  # then found TM domain
ValueError: invalid literal for int() with base 10: 'ver'

OS/Install Information

You are running Perl v b'5.032001'. Now checking perl modules... Carp: 1.50 Clone: 0.46 DBD::SQLite: 1.72 DBD::mysql: 4.050 DBI: 1.643 DB_File: 1.855 Data::Dumper: 2.183 File::Basename: 2.85 File::Which: 1.24 Getopt::Long: 2.54 Hash::Merge: 0.302 JSON: 4.10 LWP::UserAgent: 6.67 Logger::Simple: 2.0 POSIX: 1.94 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.13 Tie::File: 1.06 URI::Escape: 5.12 YAML: 1.30 local::lib: 2.000029 threads: 2.25 threads::shared: 1.61 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/data/scratch/software/funannotate/analysis/funannotate_db/ $PASAHOME=/home/admin/anaconda3/envs/funannotate_1.8/opt/pasa-2.5.2 $TRINITY_HOME=/home/admin/anaconda3/envs/funannotate_1.8/opt/trinity-2.8.5 $EVM_HOME=/home/admin/anaconda3/envs/funannotate_1.8/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/home/admin/anaconda3/envs/funannotate_1.8/config/ $GENEMARK_PATH=/home/admin/bin/gmes_linux_64_4 All 6 environmental variables are set

Checking external dependencies... ERROR: gmap found but error running gmap PASA: 2.5.2 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.5.0 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v35 diamond: 2.0.15 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: 36.3.8g glimmerhmm: 3.0.4 gmes_petap.pl: 4.35 hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 17.0.3-internal kallisto: 0.46.1 mafft: v7.508 (2022/Sep/07) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 pigz: 2.4 proteinortho: 6.1.7 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.16.1 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.11 (Oct 2022) tantan: tantan 40 tbl2asn: 25.8 tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: gmap not installed ERROR: signalp not installed

hyphaltip commented 1 year ago

Did you try deleting the signalp file and rerunning. Or seeing if there is an error in the signalp result file?

nextgenusfs commented 1 year ago

I recall a bug in one of the signalp6 versions where very rarely a line is missing a value and trips up the parser, which seems to be whats happening.

nextgenusfs commented 1 year ago

https://github.com/nextgenusfs/funannotate/issues/650

lichen-fungus commented 1 year ago

I will try to re-run signalp6, thanks for the insight!

lichen-fungus commented 1 year ago

I recall a bug in one of the signalp6 versions where very rarely a line is missing a value and trips up the parser, which seems to be whats happening.

I looked at the uppermost & lowermost 2/3 the signalp6 output and it looked normal, but I probably wouldn't have noticed a single scrambled line.

Did you try deleting the signalp file and rerunning. Or seeing if there is an error in the signalp result file?

Yes. Unfortunately, re-running signalp6 and phobius before rerunning funannotate annotate didn't help. I deleted all outputs that the first run of funannotate annotategenerated, but funannotate annotate throws the same error message (see below). Does funannotate annotate store external phobius/signalp6 results in some other location than the annotate_miscfolder that is checked upon when one is re-running?

Should I do a complete rerun from scratch, deleting the predict_results, predict_misc and annotate_miscdirectories?

I am not sure what to make of #650 ?

-------------------------------------------------------
Traceback (most recent call last):
  File "/home/admin/anaconda3/envs/funannotate_1.8/bin/funannotate", line 8, in <module>
    sys.exit(main())
  File "/home/admin/anaconda3/envs/funannotate_1.8/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main   
    mod.main(arguments)
  File "/home/admin/anaconda3/envs/funannotate_1.8/lib/python3.8/site-packages/funannotate/annotate.py", line 1337, in main
    lib.parsePhobiusSignalP(
  File "/home/admin/anaconda3/envs/funannotate_1.8/lib/python3.8/site-packages/funannotate/library.py", line 7340, in parsePhobiusSignalP
    if int(cols[1]) > 0:  # then found TM domain
ValueError: invalid literal for int() with base 10: 'ver'
nextgenusfs commented 1 year ago

Oh, sorry I didn't look at the error correctly. Its actually failing at parsing the phobius output. If you want to send the signalp and phobius files I can take a look, but looks like maybe this it is stumbling on column two where its expecting an integer and the value is "ver" -- so maybe this a version or header line? Probably can just delete it and it will work. Let me know what the line looks like.

lichen-fungus commented 1 year ago

Yes you are right, it's messed up; it somehow echoed the phobius version in the beginning of the file:

Phobius ver 1.01
(c) 2004 Lukas Kall, Anders Krogh, Erik Sonnhammer

SEQENCE ID                     TM SP PREDICTION
UYJD01_000001-T1                0  0 o

It funannotate annotate finished alright after deleting the problematic lines in nano.

Thank you so much for your help!