nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 85 forks source link

smcluster.MIBiG.blast.txt not found #336

Closed metalichen closed 4 years ago

metalichen commented 5 years ago

Hello,

I got an error while running 'funannotate annotate'. Since 'fun annotate remote' did not produce any result, I ran antiSMASH on their web server. When I tried to feed the resulting gbk file back to funannotate, I got an error:

funannotate annotate -i cry_ne_preds/ --sbt /path/template.sbt.txt --antismash cry_ne_preds/cry_ne_2.region001.gbk --eggnog cry_ne_preds/eggnog_results.emapper.annotations --cpus 14
-------------------------------------------------------
[05:31 PM]: OS: linux2, 16 cores, ~ 198 GB RAM. Python: 2.7.15
[05:31 PM]: Running funannotate v1.5.3
[05:31 PM]: Output directory cry_ne_preds already exists, will use any existing data.  If this is not what you want, exit, and provide a unique name for output folder
[05:31 PM]: Parsing input files
[05:31 PM]: Existing tbl found: cry_ne_preds/predict_results/Cryptococcus_neoformans.tbl
[05:31 PM]: Adding Functional Annotation to Cryptococcus neoformans, NCBI accession: None
[05:31 PM]: Annotation consists of: 6,725 gene models
[05:31 PM]: 6,595 protein records loaded
[05:31 PM]: Existing Pfam-A results found: cry_ne_preds/annotate_misc/annotations.pfam.txt
[05:31 PM]: 7,776 annotations added
[05:31 PM]: Running Diamond blastp search of UniProt DB version 2019_02
[05:31 PM]: 587 valid gene/product annotations from 716 total
[05:31 PM]: Existing Eggnog-mapper results found: cry_ne_preds/annotate_misc/eggnog.emapper.annotations
[05:31 PM]: Parsing EggNog Annotations
[05:31 PM]: 11,378 COG and EggNog annotations added
[05:31 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.32
[05:31 PM]: 1,623 gene name and product description annotations added
[05:31 PM]: Existing MEROPS results found: cry_ne_preds/annotate_misc/annotations.merops.txt
[05:31 PM]: 187 annotations added
[05:31 PM]: Existing CAZYme results found: cry_ne_preds/annotate_misc/annotations.dbCAN.txt
[05:31 PM]: 296 annotations added
[05:31 PM]: Existing BUSCO2 results found: cry_ne_preds/annotate_misc/annotations.busco.txt
[05:31 PM]: 1,192 annotations added
[05:31 PM]: Existing Phobius results found: cry_ne_preds/annotate_misc/phobius.results.txt
[05:31 PM]: Existing SignalP results found: cry_ne_preds/annotate_misc/signalp.results.txt
[05:31 PM]: 309 secretome and 1,464 transmembane annotations added
[05:31 PM]: Now parsing antiSMASH results, finding SM clusters
[05:31 PM]: Found 0 clusters, 0 biosynthetic enyzmes, and 0 smCOGs predicted by antiSMASH
[05:31 PM]: Found 0 duplicated annotations, adding 52,642 valid annotations
[05:31 PM]: Converting to final Genbank format, good luck!
[05:32 PM]: Creating AGP file and corresponding contigs file
[05:32 PM]: Cross referencing SM cluster hits with MIBiG database version 1.4
Traceback (most recent call last):
  File "/path/funannotate-1.5.3/bin/funannotate-functional.py", line 1101, in <module>
    with open(mibig_blast, 'rU') as input:
IOError: [Errno 2] No such file or directory: 'cry_ne_preds/annotate_misc/antismash/smcluster.MIBiG.blast.txt'

The cry_ne_preds/annotate_misc/antismash/ folder contains three files, clusters.bed, secmet.clusters.txt, and smcluster.proteins.fasta, but all of them are empty.

If I do not use --antismash flag, everything goes smoothly.

nextgenusfs commented 5 years ago

How did you run antiSMASH? You used the GBK file from funannotate predict (or update)? What version of antiSMASH? Typically the proper antiSMASH result GBK file is named something like first_contig.final.gbk

metalichen commented 5 years ago

Sorry, I accidentally copied a wrong command. I used file named "Cryptococcus_neoformans.gbk", named the same way as the file I submitted to the web server, but containing antismash annotations. Here is a fragment:

 gene            complement(1670137..1671994)
                     /locus_tag="crynepredictedgene_002970"
     CDS             complement(join(1670137..1670385,1670453..1671205,
                     1671262..1671376,1671432..1671994))
                     /codon_start=1
                     /locus_tag="crynepredictedgene_002970"
                     /product="hypothetical protein"
                     /protein_id="ncbi_crynepredictedgene_002970-T1"
                     /transl_table=1
                     /translation="MGPGQGYRPPPYGLPTRPSLDTPASAVSQSQRQGFPQPQPYMPQT
                     QQGFYPGYGYGYQPNLSGGYPSGGFHPMYAAPAPSFGQSLFQSPVAVNPEGYSYSTTYL
                     SSQYNPQVSAPNPPMKRQRPNNSNVMTGGVLSAKPWRNCSHPGCKFVGPGDQVEIHEED
                     RHLIYAPGKVPQRSEEEERFAKRKGPLPPIQGTNITLNTPEDIEKWIAERKSRWPTAKR
                     VLEKEEERQAAIARGEVPAKQRKGKGRRNDPASRAEEWGREVKDEEADIPRVFGGERGR
                     GRGRGRGSVRGRGGRAGNEGRSDGVAPVHSIVQTSTQSQSRQSSEVNPTADSLIGLGGY
                     DTPAESASASDSSDTESSTESDVGSDSSSDSSSGSEDDHAQLKPAEASTSSPATTTTTK
                     PSISTLSKPICKFFAQQGRCKFNDRCRFAHIAPDGSSVDTSAQGENRKPAPQQEKKRQP
                     RQPSARKLNPFERPSMLGALLANPIQNTLSQISQTIRFLVANDMLQNVEIRPGQVEEEE
                     KARNKVVLLDGSSKDNNGATEDNLNMEGGDDIIQELKETEGE"
     protocluster    1671730..1696243
                     /aStool="rule-based-clusters"
                     /contig_edge="False"
                     /core_location="join{[1681729:1681757](+),
                     [1681807:1681821](+), [1681886:1682072](+),
                     [1682138:1682347](+), [1682403:1682986](+),
                     [1683034:1683190](+), [1683251:1683590](+),
                     [1683935:1684044](+), [1684122:1684341](+),
                     [1684397:1684404](+), [1684464:1684481](+),
                     [1684539:1684596](+), [1684645:1684776](+),
                     [1684832:1685305](+), [1685358:1685416](+),
                     [1685464:1685654](+), [1685707:1685781](+),
                     [1685836:1685907](+), [1685972:1686072](+),
                     [1686141:1686243](+)}"
                     /cutoff="20000"
                     /detection_rule="(Terpene_synth or Terpene_synth_C or
                     phytoene_synt or Lycopene_cycl or terpene_cyclase or NapT7
                     or fung_ggpps or fung_ggpps2 or trichodiene_synth or TRI5)"
                     /neighbourhood="10000"
                     /product="terpene"
                     /protocluster_number="1"
                     /tool="antismash"
     proto_core      join(1681730..1681757,1681808..1681821,1681887..1682072,
                     1682139..1682347,1682404..1682986,1683035..1683190,
                     1683252..1683590,1683936..1684044,1684123..1684341,
                     1684398..1684404,1684465..1684481,1684540..1684596,
                     1684646..1684776,1684833..1685305,1685359..1685416,
                     1685465..1685654,1685708..1685781,1685837..1685907,
                     1685973..1686072,1686142..1686243)
                     /aStool="rule-based-clusters"
                     /tool="antismash"
                     /cutoff="20000"
                     /detection_rule="(Terpene_synth or Terpene_synth_C or
                     phytoene_synt or Lycopene_cycl or terpene_cyclase or NapT7
                     or fung_ggpps or fung_ggpps2 or trichodiene_synth or TRI5)"
                     /neighbourhood="10000"
                     /product="terpene"
                     /protocluster_number="1"
     cand_cluster    1671730..1696243
                     /candidate_cluster_number="1"
                     /contig_edge="False"
                     /detection_rules="(Terpene_synth or Terpene_synth_C or
                     phytoene_synt or Lycopene_cycl or terpene_cyclase or NapT7
                     or fung_ggpps or fung_ggpps2 or trichodiene_synth or TRI5)"
                     /kind="single"
                     /product="terpene"
                     /protoclusters="1"
                     /tool="antismash"

I used antiSMASH version 5.0.0 on the web server (https://fungismash.secondarymetabolites.org/#!/start).

nextgenusfs commented 5 years ago

Does the logfile have anymore information than what you posted from the terminal output?

metalichen commented 5 years ago

Here it is:

[10/09/19 19:52:42]: 309 secretome and 1,464 transmembane annotations added
[10/09/19 19:52:42]: Now parsing antiSMASH results, finding SM clusters
[10/09/19 19:52:46]: Found 0 clusters, 0 biosynthetic enyzmes, and 0 smCOGs predicted by antiSMASH
[10/09/19 19:52:46]: bedtools intersect -wo -a cry_ne_preds/annotate_misc/antismash/clusters.bed -b /path/cryptococcus/cry_ne_preds/predict_results/Cryptococcus_neoformans.gff3
[10/09/19 19:52:46]: Found 0 duplicated annotations, adding 52,642 valid annotations
[10/09/19 19:52:46]: Parsing tbl file: /path/cryptococcus/cry_ne_preds/annotate_misc/genome.tbl
[10/09/19 19:52:47]: Converting to final Genbank format, good luck!
[10/09/19 19:52:47]: tbl2asn -y "Annotated using funannotate v1.5.3" -N 1 -t ../bryoria_tortuosa/template.sbt.txt -M n -j "[organism=Cryptococcus neoformans]" -V b -c fx -T -a r10u -l paired-ends -Z cry_ne_preds/annotate_misc/tbl2asn/1/discrepency.report.txt -p cry_ne_preds/annotate_misc/tbl2asn/1
[10/09/19 19:53:23]: [tbl2asn] Flatfile genome

[tbl2asn] Validating genome

[10/09/19 19:53:33]: Creating AGP file and corresponding contigs file
[10/09/19 19:53:33]: perl /home/gulnara_tagirdzhanova/downloads/funannotate-1.5.3/util/fasta2agp.pl Cryptococcus_neoformans.scaffolds.fa
[10/09/19 19:53:34]: Cross referencing SM cluster hits with MIBiG database version 1.4
[10/09/19 19:53:34]: diamond blastp --sensitive --query cry_ne_preds/annotate_misc/antismash/smcluster.proteins.fasta --threads 14 --out cry_ne_preds/annotate_misc/antismash/smcluster.MIBiG.blast.txt --db /scratch/1/gulnara/funannotate_db/mibig.dmnd --max-hsps 1 --evalue 0.001 --max-target-seqs 1 --outfmt 6
[10/09/19 19:53:34]: diamond v0.9.24.125 | by Benjamin Buchfink <buchfink@gmail.com>
Licensed under the GNU GPL <https://www.gnu.org/licenses/gpl.txt>
Check http://github.com/bbuchfink/diamond for updates.

#CPU threads: 14
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: cry_ne_preds/annotate_misc/antismash
Opening the database...  [7.1e-05s]
#Target sequences to report alignments for: 1
Opening the input file...  [0.000102s]
Error: Error detecting input file format. First line seems to be blank.

It seems that the problem is in parsing the antiSMASH output file

nextgenusfs commented 5 years ago

Yes. Just noticed that you have an old version of funannotate. Please update to newest version and see if the problem persists. We addressed some of these v4.5 v5 of antiSMASH in the last few months.