nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
312 stars 83 forks source link

Potential Typo in Antismash parser #299

Closed PlantDr430 closed 5 years ago

PlantDr430 commented 5 years ago

Currently using funannotate v1.6.0-ad5c0de as I had to re-run one sample prior to NCBI submission.

When I can my sample through I was getting and MIBiG error where the program wasn't finding the correct file. I also noticed that the parser was finding SM clusters and reported them to the standard output, but only created the cluster.bed and didn't create any other file associated with antimash (i.e. the secmet.clusters.txt was left blank). The .bed file looked like this.

contig_5    0   37056   Cluster_5.1 0   +
contig_16   <57 8796    Cluster_16.1    0   +
contig_16   56302   127287  Cluster_16.2    0   +
contig_39   926 53245   Cluster_39.1    0   +
contig_39   926 53245   Cluster_39.2    0   +
contig_151  0   28522   Cluster_151.1   0   +
contig_172  0   44595   Cluster_172.1   0   +
contig_199  0   38485   Cluster_199.1   0   +
contig_205  19193   45936   Cluster_205.1   0   +
contig_215  0   27391   Cluster_215.1   0   +
contig_222  3292    44079   Cluster_222.1   0   +
contig_230  0   24777   Cluster_230.1   0   +

I noticed that for the start of the second cluster an "<" was inserted into the integer. I believe this was causing an error somewhere and failed to create the cluster.txt. file because of it. I checked my antismash.gbk file and noticed that it contained the "<" in the features, such as:

COMMENT     'Annotated using funannotate v1.5.1'.
            ##antiSMASH-Data-START##
            Version      :: 5.0.0
            Run date     :: 2019-06-29 16:58:09
            ##antiSMASH-Data-END##
FEATURES             Location/Qualifiers
     source          1..129268
                     /db_xref="taxon:83212"
                     /mol_type="genomic DNA"
                     /organism="Claviceps africana"
                     /strain="CCC489"
     mRNA            complement(join(<58..1256,1476..1642,1720..2899))
                     /locus_tag="E4U42_001851"
                     /product="hypothetical protein"
     gene            complement(<58..2899)
                     /locus_tag="E4U42_001851"
     CDS             complement(join(<58..1256,1476..1642,1720..2899))
                     /codon_start=1
                     /gene_functions="transport (smcogs) SMCOG1288:ABC
                     transporter related protein (Score: 134.9; E-value: 6e-41)"
                     /gene_kind="transport"
                     /locus_tag="E4U42_001851"
                     /product="hypothetical protein"
                     /protein_id="ncbi_E4U42_001851-T1"
                     /transl_table=1
                     /translation="MAAAQALTQILPQMIAVSKAMAAAQNLFSTIDRVSNMDTLSEDGI
                     EPADFQGHIRLQGVGFSYPARPNTPVLQDVNLEIRPNQVTAIVGASGSGKSTIFGLIER
                     WYAYSSGEMTLDGHRLESIKLRWLRTKIRLVQQEPTLFSGSIYQNVMDGLAGCDDGLSD
                     GEKKHRVVAACKAVLMHDFIAELPRGYDSCIGERGASLSGGQRQRLVIARAIVSDPKVL
                     LLDEATSALDAHAEKAVQAALNNIARGRTVVVIAHRLSTVRDSDNIIVLGKGGRVMESG
                     THARLVALGGAYASLARTQDLAENMPDPVEGEEGSVASGEEEERAVAAPDVDSAQTPTA
                     RRGSGSGSGKKGESRRHGTLSSYGLLHGLFLIIKEQRTLWRPLSVTLVCCTAGGLLSSS
                     MAVVVANSLEVYRGADFDKARFFAIMFFAIGLCSILVYATIGWISNVIAQTIIRFYRRD
                     ILDNTLRQDMAFFDRPENNTGALVARLASEPLSLQELLSFNVSLVVISIVNAVCGCTVA
                     VISGWKLGLAMCLGAMPVIVGAGYLRIRLEVRFEQDTARSFASSSAVAAEAVMGIRTVC
                     SLALEEAVVERYSQSLQDLVRDSIGGLGVKAFLYALSQSASLLVMGLGFWYGGRLVSTG
                     EYTLRQFYVVYMVVIYSGGATAALFQHTTSISKACTAINYILGLRQTRVLLDDDDAEED
                     EDHDPGAAVARPVDEKGPGLEAGLERVHFAYPLRPKQKVLRGIDMSIRPGQMTALVGAS
                     GCGKSTLIGLLERFYDPSSGTVWVRDDGRRRDIRTLHRRRHRRDVALVQQEPVLYQGSI
                     LDNVALGIEHDRLRPADPPEARIEAACRAAHIWDFIA"
     protocluster    <58..8796
                     /aStool="rule-based-clusters"

When I removed the "<" from the .gbk file and re-ran the program and everything ran smoothly. I am not sure why antismash inserted the "<" but I did also see it on the website.

image

nextgenusfs commented 5 years ago

Those carrots (<>) in genbank format refer to a partial gene model. So you are saying in the genbank file that you submitted to antiSMASH the carrot was not in the file for this gene?

PlantDr430 commented 5 years ago

I will have to check the input .gbk I used for antismash. However, I think that when partial gene models are parsed by your antismash and are incorporated into the cluster.bed file the downstream analysis is terminated since there are carrots in the start or stop position of the bed file. I might be wrong, but when I removed the carrot I was able to finish the annotate co.mand without any errors.

On Sun, Jun 30, 2019, 11:02 AM Jon Palmer notifications@github.com wrote:

Those carrots (<>) in genbank format refer to a partial gene model. So you are saying in the genbank file that you submitted to antiSMASH the carrot was not in the file for this gene?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/299?email_source=notifications&email_token=AHB5CP2OPAIZZWESN3YVKTDP5DRITA5CNFSM4H4MZH5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY4P4HI#issuecomment-507051549, or mute the thread https://github.com/notifications/unsubscribe-auth/AHB5CP7CXGLHYCANCJUCZZLP5DRITANCNFSM4H4MZH5A .

nextgenusfs commented 5 years ago

Okay will have a look today at the code.

nextgenusfs commented 5 years ago

This https://github.com/nextgenusfs/funannotate/commit/2fe943ef85b736eb29fecbd203b8f10c6bfd3b4a should strip the partial notation from the coordinates into the bed file -- which as you found out is used to parse further downstream.

PlantDr430 commented 5 years ago

Cool, sounds good. I think I am all done with Funannotate for this project as NCBI has accepted my other annotations so far. As a side note, I know you are busy, but do you have a timeline for potential publication of the program? I probably won't be writing my manuscript for another couple of months, but didn't know if there is a manuscript in preparation or submission for Funannotate yet.

On Sun, Jun 30, 2019 at 12:32 PM Jon Palmer notifications@github.com wrote:

This 2fe943e https://github.com/nextgenusfs/funannotate/commit/2fe943ef85b736eb29fecbd203b8f10c6bfd3b4a should strip the partial notation from the coordinates into the bed file -- which as you found out is used to parse further downstream.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/299?email_source=notifications&email_token=AHB5CP6VQWBGSMCOWJXG3RTP5D34FA5CNFSM4H4MZH5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY4RIRQ#issuecomment-507057222, or mute the thread https://github.com/notifications/unsubscribe-auth/AHB5CP6W7ZISB7VVTYZLZHTP5D34FANCNFSM4H4MZH5A .

hyphaltip commented 5 years ago

Let’s say in prep. I promised Jon I would get a draft started this summer.

Jason Stajich, PhD jasonstajich.phd@gmail.com On Jun 30, 2019, 11:40 AM -0700, Stephen A. Wyka notifications@github.com, wrote:

Cool, sounds good. I think I am all done with Funannotate for this project as NCBI has accepted my other annotations so far. As a side note, I know you are busy, but do you have a timeline for potential publication of the program? I probably won't be writing my manuscript for another couple of months, but didn't know if there is a manuscript in preparation or submission for Funannotate yet.

On Sun, Jun 30, 2019 at 12:32 PM Jon Palmer notifications@github.com wrote:

This 2fe943e https://github.com/nextgenusfs/funannotate/commit/2fe943ef85b736eb29fecbd203b8f10c6bfd3b4a should strip the partial notation from the coordinates into the bed file -- which as you found out is used to parse further downstream.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/299?email_source=notifications&email_token=AHB5CP6VQWBGSMCOWJXG3RTP5D34FA5CNFSM4H4MZH5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY4RIRQ#issuecomment-507057222, or mute the thread https://github.com/notifications/unsubscribe-auth/AHB5CP6W7ZISB7VVTYZLZHTP5D34FANCNFSM4H4MZH5A .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.