MIBiG results antiSMASH

AnotherSimon commented 6 years ago

Hi John,

In v1.3.0 I'm having an issue with funnanotate annotate --antismash to incorporate results from funannotate remote. It appears that results from antiSMASH are not properly parsed or a file path is not set correctly. Example output:

... [92m[Mar 23 12:11 AM][0m: Found phobius pre-computed results [92m[Mar 23 12:11 AM][0m: Predicting secreted proteins with SignalP [92m[Mar 23 12:14 AM][0m: 171 secretome and 786 transmembane annotations added [92m[Mar 23 12:14 AM][0m: Parsing InterProScan5 XML file [92m[Mar 23 12:14 AM][0m: Now parsing antiSMASH results, finding SM clusters [92m[Mar 23 12:14 AM][0m: Found 5 clusters, 165 biosynthetic enyzmes, and 17 smCOGs predicted by antiSMASH [92m[Mar 23 12:14 AM][0m: Found 0 duplicated annotations, adding 33,894 valid annotations [92m[Mar 23 12:14 AM][0m: Converting to final Genbank format, good luck! [92m[Mar 23 12:14 AM][0m: Creating AGP file and corresponding contigs file [92m[Mar 23 12:14 AM][0m: Cross referencing SM cluster hits with MIBiG database version 1.3 Traceback (most recent call last): File "/home/simon/software/funannotate/bin/funannotate-functional.py", line 1055, in with open(mibig_blast, 'rU') as input: IOError: [Errno 2] No such file or directory: '../MyBug/annotate_misc/antismash/smcluster.MIBiG.blast.txt'

I'm aware of issue #121 but doesn't seem to be the same bug.

nextgenusfs commented 6 years ago

I thought I might have addressed that with https://github.com/nextgenusfs/funannotate/commit/e65305a42ec4a3f4f76b13825d86af7fbd53eab8 commit. I'm not sure if the current tip is stable or not I'm away from office doing field work so difficult for me to check full functionality on my laptop. But if you are using version newer than that commit then it is obviously not fixed.

Do the annotations from antismash have the correct mRNA-IDs? They should end with -T1, etc (you can look at the text files in annotate_misc that correspond to the antiSMASH results. This again was bug introduced when trying to support multiple transcripts.

AnotherSimon commented 6 years ago

My version is definitely post e65305a. Sample from MyBug/annotate_misc/annotations.antismash.txt:

... FUN_001646-T1 product Nonribosomal Peptide Synthase (NRPS) FUN_002750-T1 product terpene cyclase FUN_001649-T1 note SMCOG1227:ribosome_biogenesis_GTP-binding_protein_YsxC FUN_001162-T1 note SMCOG1248:methyltransferase FUN_001642-T1 note SMCOG1173:WD-40_repeat-containing_protein ...

PS: None of the entries end in "-T2".

gamcil commented 6 years ago

Was having this issue with local fungiSMASH results with commit https://github.com/nextgenusfs/funannotate/commit/e1048c06ab55e74fcc682877197cd5d059686683, so don't think there's a difference in output between antiSMASH versions.

Found the problem in funannotate_functional.py - gene names from the predict_results proteins.fa are stripped of the -T1 suffix, but not those in the set of SM cluster proteins, causing the check to fail. So writing of smcluster.proteins.fasta fails, diamond search fails, and smcluster.MIBiG.blast.txt isn't created.

I'll try and create a pull request to fix (https://github.com/nextgenusfs/funannotate/pull/169).