annotate with gff3 and fasta

nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline

http://funannotate.readthedocs.io

BSD 2-Clause "Simplified" License

321 stars 85 forks source link

annotate with gff3 and fasta #318

Closed PlantDr430 closed 4 years ago

PlantDr430 commented 5 years ago

I am trying to use the "annotate" function with the latest version I got from git clone by using only a gff3 and genome fasta file. I had to revert to use some old reference genomes due to some conflicts so I have to work with these two files and I wanted to try and perform the same annotation functions on these references as I did on my other Funannotate annotations to make it easier for down stream analysis.

Genome - Clfusi_contigs.fasta.gz

Gff3 - Clfusi_sort.gff3.txt

This is the error I am getting:

stephenwyka@bspmgenomics:/data/wyka/final_comparative_analysis/references/Clfusi$ /data/wyka/FUNANNOTATE/funannotate/funannotate.py annotate --gff gff3sort/Clfusi_sort.gff3 --fasta Clfusi_contigs.fasta -s "Claviceps fusiformis" -o Clfusi_fun_output --antismash Clfusi_antismash.gbk --iprscan Clfusi_og_ipr.xml --phobius Clfusi_og_phobius.txt --busco_db sordariomycetes --cpus 16
[03:58 PM]: OS: linux2, 24 cores, ~ 264 GB RAM. Python: 2.7.16
[03:58 PM]: Running funannotate v1.6.0-df4262f
[03:58 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt'
[03:58 PM]: Output directory Clfusi_fun_output already exists, will use any existing data.  If this is not what you want, exit, and provide a unique name for output folder
[03:58 PM]: Parsing annotation and preparing annotation files.
Traceback (most recent call last):
  File "/data/wyka/FUNANNOTATE/funannotate/bin/funannotate-functional.py", line 469, in <module>
    GeneCounts = lib.convertgff2tbl(GFF, prefix, Scaffolds, Proteins, Transcripts, annotTBL)
  File "/data/wyka/FUNANNOTATE/funannotate/lib/library.py", line 1672, in convertgff2tbl
    Genes = gff2dict(gff, fasta, Genes)
  File "/data/wyka/FUNANNOTATE/funannotate/lib/library.py", line 3489, in gff2dict
    codon_start = int(v['phase'][i][indexStart[0]]) + 1
IndexError: list index out of range

It seems to be a problem with the Gff3 file, which I did try to alter from the original one which wasn't in any particular order (not ordered by contigs or by start positions). I tried dissecting your code to try and determine how else I could alter the Gff3 to work with Funannotate, but so far I have not succeeded. Can you take a look to see if you can identify the problem?

nextgenusfs commented 5 years ago

Sure -- from the error its likely the phase of the CDS is missing. I'll maybe add a gff-check to the funannotate util menu or something. Let me see if I can figure out what is wrong first though.

nextgenusfs commented 5 years ago

Okay, so this gene model doesn't have a CDS which is causing the error.

cdsID=fgenesh_masked-contig03018-processed-gene-0.1-mRNA-1 phase=[] CDS=[] CDSidxStart=[]
ERROR: unable to determine phase for gene=fgenesh_masked-contig03018-processed-gene-0.1 type=mRNA cdsID=fgenesh_masked-contig03018-processed-gene-0.1-mRNA-1

(funannotate) jon@Jons-MacBook-Pro:~/Downloads/gff_parsing$ grep 'fgenesh_masked-contig03018-processed-gene-0.1' Clfusi_sort.gff3.txt 
contig03018 maker   gene    3501    3503    .   -   .   ID=fgenesh_masked-contig03018-processed-gene-0.1;Name=fgenesh_masked-contig03018-processed-gene-0.1
contig03018 maker   mRNA    3501    3503    .   -   .   ID=fgenesh_masked-contig03018-processed-gene-0.1-mRNA-1;Parent=fgenesh_masked-contig03018-processed-gene-0.1;Name=fgenesh_masked-contig03018-processed-gene-0.1-mRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|1|1|3|0
contig03018 maker   exon    3501    3503    .   -   .   ID=fgenesh_masked-contig03018-processed-gene-0.1-mRNA-1:exon:1513;Parent=fgenesh_masked-contig03018-processed-gene-0.1-mRNA-1
contig03018 maker   three_prime_UTR 3501    3503    .   -   .   ID=fgenesh_masked-contig03018-processed-gene-0.1-mRNA-1:three_prime_utr;Parent=fgenesh_masked-contig03018-processed-gene-0.1-mRNA-1

Note there are probably more. This clearly shouldn't be a gene model, its only 3 nucleotides...

PlantDr430 commented 5 years ago

Okay. And yea I know. This reference didn't have any cut offs for protein lengths so they have some proteins that are seriously 1 as long. I actually thought I was using a version that I applied a cutoff too, but realize now that it wasn't that file.

On Tue, Aug 20, 2019, 6:27 PM Jon Palmer notifications@github.com wrote:

Okay, so this gene model doesn't have a CDS which is causing the error.

cdsID=fgenesh_masked-contig03018-processed-gene-0.1-mRNA-1 phase=[] CDS=[] CDSidxStart=[] ERROR: unable to determine phase for gene=fgenesh_masked-contig03018-processed-gene-0.1 type=mRNA cdsID=fgenesh_masked-contig03018-processed-gene-0.1-mRNA-1

(funannotate) jon@Jons-MacBook-Pro:~/Downloads/gff_parsing$ grep 'fgenesh_masked-contig03018-processed-gene-0.1' Clfusi_sort.gff3.txt contig03018 maker gene 3501 3503 . - . ID=fgenesh_masked-contig03018-processed-gene-0.1;Name=fgenesh_masked-contig03018-processed-gene-0.1 contig03018 maker mRNA 3501 3503 . - . ID=fgenesh_masked-contig03018-processed-gene-0.1-mRNA-1;Parent=fgenesh_masked-contig03018-processed-gene-0.1;Name=fgenesh_masked-contig03018-processed-gene-0.1-mRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|1|1|3|0 contig03018 maker exon 3501 3503 . - . ID=fgenesh_masked-contig03018-processed-gene-0.1-mRNA-1:exon:1513;Parent=fgenesh_masked-contig03018-processed-gene-0.1-mRNA-1 contig03018 maker three_prime_UTR 3501 3503 . - . ID=fgenesh_masked-contig03018-processed-gene-0.1-mRNA-1:three_prime_utr;Parent=fgenesh_masked-contig03018-processed-gene-0.1-mRNA-1

Note there are probably more. This clearly shouldn't be a gene model, its only 3 nucleotides...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/318?email_source=notifications&email_token=AHB5CP4XCWPDAVGUZZAP6UTQFSDY5A5CNFSM4IN5S5JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4YBULA#issuecomment-523246124, or mute the thread https://github.com/notifications/unsubscribe-auth/AHB5CPYE4V4LHO42FCCG6Y3QFSDY5ANCNFSM4IN5S5JA .

nextgenusfs commented 5 years ago

You can take the output of maker and run it through funannotate predict -- it grabs the gene models and then run evidence modeler to build consensus gene models. But it will change some of the predictions. I assume you'd like to rename the gene models as well? I can update the function to just warn the user about a bad annotation, remove it, and then keep going. Is that the preferred functionality?

nextgenusfs commented 5 years ago

Looks like there aren't too many:

$ funannotate util gff2tbl -f Clfusi_contigs.fasta -g Clfusi_sort.gff3.txt > test.tbl
ERROR: ID=fgenesh_masked-contig03018-processed-gene-0.1 has no CDS features, removing gene model
ERROR: ID=fgenesh_masked-contig05747-processed-gene-0.0 has no CDS features, removing gene model
ERROR: ID=fgenesh_masked-contig06137-processed-gene-0.0 has no CDS features, removing gene model
ERROR: ID=fgenesh_masked-contig01492-processed-gene-0.1 has no CDS features, removing gene model
ERROR: ID=fgenesh_masked-contig03074-processed-gene-0.1 has no CDS features, removing gene model

PlantDr430 commented 5 years ago

So, I can't do that. All I have for three species are a genome file, protein file, and gff3. I was planning on re-annotating the old versions with Funannotate but the old primary author didn't give me permission to update his annotations as he's kind of working on them more as well. Anyways I'm stuck with these three files and I am trying to get them to formatted to push them through Funannotate annotate so that I can standardize them somewhat better for comparison of my other genomes.

On Tue, Aug 20, 2019 at 6:58 PM Jon Palmer notifications@github.com wrote:

You can take the output of maker and run it through funannotate predict -- it grabs the gene models and then run evidence modeler to build consensus gene models. But it will change some of the predictions. I assume you'd like to rename the gene models as well? I can update the function to just warn the user about a bad annotation, remove it, and then keep going. Is that the preferred functionality?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/318?email_source=notifications&email_token=AHB5CP7DQP2CLWCA5TGXIF3QFSHKZA5CNFSM4IN5S5JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4YDAJI#issuecomment-523251749, or mute the thread https://github.com/notifications/unsubscribe-auth/AHB5CP3ZN2Z62FHIBEJIM2DQFSHKZANCNFSM4IN5S5JA .

PlantDr430 commented 5 years ago

Ah, that is useful. Thanks!

On Tue, Aug 20, 2019 at 7:20 PM Stephen Wyka stephenwyka@gmail.com wrote:

So, I can't do that. All I have for three species are a genome file, protein file, and gff3. I was planning on re-annotating the old versions with Funannotate but the old primary author didn't give me permission to update his annotations as he's kind of working on them more as well. Anyways I'm stuck with these three files and I am trying to get them to formatted to push them through Funannotate annotate so that I can standardize them somewhat better for comparison of my other genomes.

On Tue, Aug 20, 2019 at 6:58 PM Jon Palmer notifications@github.com wrote:

You can take the output of maker and run it through funannotate predict -- it grabs the gene models and then run evidence modeler to build consensus gene models. But it will change some of the predictions. I assume you'd like to rename the gene models as well? I can update the function to just warn the user about a bad annotation, remove it, and then keep going. Is that the preferred functionality?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/318?email_source=notifications&email_token=AHB5CP7DQP2CLWCA5TGXIF3QFSHKZA5CNFSM4IN5S5JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4YDAJI#issuecomment-523251749, or mute the thread https://github.com/notifications/unsubscribe-auth/AHB5CP3ZN2Z62FHIBEJIM2DQFSHKZANCNFSM4IN5S5JA .

nextgenusfs commented 5 years ago

Okay, try the latest commit https://github.com/nextgenusfs/funannotate/commit/f93ee193d389f1093bf6009223fe89b8a7d1e491. Running okay for me:

$ funannotate annotate --gff Clfusi_sort.gff3.txt --fasta Clfusi_contigs.fasta -s "Cluster fuchichus" -o test_annotate --rename CFUCH
-------------------------------------------------------
[Aug 20 06:08 PM]: OS: MacOSX 10.14.6, 8 cores, ~ 17 GB RAM. Python: 2.7.15
[Aug 20 06:08 PM]: Running funannotate v1.6.0-df4262f
[Aug 20 06:08 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt'
[Aug 20 06:08 PM]: Parsing annotation and preparing annotation files.
ERROR: ID=fgenesh_masked-contig03018-processed-gene-0.1 has no CDS features, removing gene model
ERROR: ID=fgenesh_masked-contig05747-processed-gene-0.0 has no CDS features, removing gene model
ERROR: ID=fgenesh_masked-contig06137-processed-gene-0.0 has no CDS features, removing gene model
ERROR: ID=fgenesh_masked-contig01492-processed-gene-0.1 has no CDS features, removing gene model
ERROR: ID=fgenesh_masked-contig03074-processed-gene-0.1 has no CDS features, removing gene model
[Aug 20 06:08 PM]: Adding Functional Annotation to Cluster fuchichus, NCBI accession: None
[Aug 20 06:08 PM]: Annotation consists of: 9,779 gene models
[Aug 20 06:08 PM]: 9,929 protein records loaded
[Aug 20 06:08 PM]: Running HMMer search of PFAM version 31.0
[Aug 20 06:17 PM]: 9,240 annotations added
[Aug 20 06:17 PM]: Running Diamond blastp search of UniProt DB version 2018_07
[Aug 20 06:21 PM]: 799 valid gene/product annotations from 1,048 total

PlantDr430 commented 5 years ago

Will do, I will let you know how it goes. Thanks.

On Tue, Aug 20, 2019 at 7:48 PM Jon Palmer notifications@github.com wrote:

Okay, try the latest commit f93ee19 https://github.com/nextgenusfs/funannotate/commit/f93ee193d389f1093bf6009223fe89b8a7d1e491. Running okay for me:

$ funannotate annotate --gff Clfusi_sort.gff3.txt --fasta Clfusi_contigs.fasta -s "Cluster fuchichus" -o test_annotate --rename CFUCH

[Aug 20 06:08 PM]: OS: MacOSX 10.14.6, 8 cores, ~ 17 GB RAM. Python: 2.7.15 [Aug 20 06:08 PM]: Running funannotate v1.6.0-df4262f [Aug 20 06:08 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt' [Aug 20 06:08 PM]: Parsing annotation and preparing annotation files. ERROR: ID=fgenesh_masked-contig03018-processed-gene-0.1 has no CDS features, removing gene model ERROR: ID=fgenesh_masked-contig05747-processed-gene-0.0 has no CDS features, removing gene model ERROR: ID=fgenesh_masked-contig06137-processed-gene-0.0 has no CDS features, removing gene model ERROR: ID=fgenesh_masked-contig01492-processed-gene-0.1 has no CDS features, removing gene model ERROR: ID=fgenesh_masked-contig03074-processed-gene-0.1 has no CDS features, removing gene model [Aug 20 06:08 PM]: Adding Functional Annotation to Cluster fuchichus, NCBI accession: None [Aug 20 06:08 PM]: Annotation consists of: 9,779 gene models [Aug 20 06:08 PM]: 9,929 protein records loaded [Aug 20 06:08 PM]: Running HMMer search of PFAM version 31.0 [Aug 20 06:17 PM]: 9,240 annotations added [Aug 20 06:17 PM]: Running Diamond blastp search of UniProt DB version 2018_07 [Aug 20 06:21 PM]: 799 valid gene/product annotations from 1,048 total

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/318?email_source=notifications&email_token=AHB5CP5KUGTHXQYXM3JZCLLQFSNG7A5CNFSM4IN5S5JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4YFL2A#issuecomment-523261416, or mute the thread https://github.com/notifications/unsubscribe-auth/AHB5CP5QKFW3PH5XNL3ZTTTQFSNG7ANCNFSM4IN5S5JA .

PlantDr430 commented 5 years ago

I was able to reproduce your results, but since had to wait for further information from the species I was using as I noticed discrepancies between the protein file and Gff3 file I was provided. Anyway, I had one sample that had correct files so I switched to working with that species.

To be consistent with my other reference species I used the Genome and Gff3 file to run Antismash. The .gbk output looks exactly the same as a .gbk when running Antismash with a .gbk file instead of a Genome + Gff3 file. However, I am getting an error with the Antismash Parser for v.5. It appears to be an error with the partial gene error I had before when writing to the bed file.

[12:00 AM]: OS: linux2, 24 cores, ~ 264 GB RAM. Python: 2.7.16
[12:00 AM]: Running funannotate v1.6.0-f93ee19
[12:00 AM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt'
[12:00 AM]: Output directory CPURP_fun_output already exists, will use any existing data.  If this is not what you want, exit, and provide a unique name for output folder
[12:00 AM]: Parsing annotation and preparing annotation files.
[12:00 AM]: Adding Functional Annotation to C purp, NCBI accession: None
[12:00 AM]: Annotation consists of: 17,406 gene models
[12:00 AM]: 8,703 protein records loaded
[12:00 AM]: Running HMMer search of PFAM version 32.0
[12:08 AM]: 9,822 annotations added
[12:08 AM]: Running Diamond blastp search of UniProt DB version 2019_07
[12:22 AM]: 740 valid gene/product annotations from 1,002 total
[12:22 AM]: Running Eggnog-mapper
[12:22 AM]: No Eggnog-mapper results found.
[12:22 AM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.45
[12:22 AM]: 740 gene name and product description annotations added
[12:22 AM]: Running Diamond blastp search of MEROPS version 12.0
[12:23 AM]: 255 annotations added
[12:23 AM]: Annotating CAZYmes using HMMer search of dbCAN version 8.0
[12:24 AM]: 357 annotations added
[12:24 AM]: Annotating proteins with BUSCO sordariomycetes models
[12:28 AM]: 3,556 annotations added
[12:28 AM]: Existing Phobius results found: CPURP_fun_output/annotate_misc/phobius.results.txt
[12:28 AM]: Predicting secreted proteins with SignalP
[12:38 AM]: 0 secretome and 1,565 transmembane annotations added
[12:38 AM]: Parsing InterProScan5 XML file
[12:38 AM]: Now parsing antiSMASH v5 results, finding SM clusters
Traceback (most recent call last):
  File "/data/wyka/FUNANNOTATE/funannotate/bin/funannotate-functional.py", line 871, in <module>
    lib.ParseAntiSmash(antismash_input, AntiSmashFolder, AntiSmashBed, AntiSmash_annotations) #results in several global dictionaries
  File "/data/wyka/FUNANNOTATE/funannotate/lib/library.py", line 5733, in ParseAntiSmash
    if '<' in start:
TypeError: argument of type 'ExactPosition' is not iterable
stephenwyka@bspmgenomics:/data/wyka/final_comparative_analysis/references/CPURP$

Don't mind the 0 secretome results, I know what that problem is and it's just a naming error with my phobius file.

nextgenusfs commented 5 years ago

I added this because we had problems with 'fuzzy starts' in a different Genbank file, tried to make it ignore those, let me know if this works https://github.com/nextgenusfs/funannotate/commit/24f34f6e5c7ac7073c238d4ab03b2e4c807909d6

PlantDr430 commented 5 years ago

Yes, that worked for this genbank file. But I know the other version was successful in working with genbank files that have 'fuzzy starts'.

On Wed, Aug 21, 2019 at 6:48 AM Jon Palmer notifications@github.com wrote:

I added this because we had problems with 'fuzzy starts' in a different Genbank file, tried to make it ignore those, let me know if this works 24f34f6 https://github.com/nextgenusfs/funannotate/commit/24f34f6e5c7ac7073c238d4ab03b2e4c807909d6

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/318?email_source=notifications&email_token=AHB5CPZXUPMXINLMCOAH35DQFU2TFA5CNFSM4IN5S5JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4ZRLWI#issuecomment-523441625, or mute the thread https://github.com/notifications/unsubscribe-auth/AHB5CP5OPAP4L6WRGMG5WSDQFU2TFANCNFSM4IN5S5JA .

nextgenusfs commented 5 years ago

Could you test a few others so we know this is robust? This change should still work if the coords are fuzzy, well at least theoretically.

PlantDr430 commented 5 years ago

Yea, one second. I'm having problems with numpy now for the genbank creation step. I need to figure that out first.

On Wed, Aug 21, 2019, 7:31 AM Jon Palmer notifications@github.com wrote:

Could you test a few others so we know this is robust? This change should still work if the coords are fuzzy, well at least theoretically.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/318?email_source=notifications&email_token=AHB5CP7BARG42ETOEGTD6MLQFU7SXA5CNFSM4IN5S5JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4ZVLWA#issuecomment-523458008, or mute the thread https://github.com/notifications/unsubscribe-auth/AHB5CP6ZT5T6JNSANCNOWTTQFU7SXANCNFSM4IN5S5JA .

nextgenusfs commented 5 years ago

Does antismash take a GFF3 format now? I guess I'd be careful/skeptical that it will get converted to GBK in exactly the same way as funannotate does it -- because GFF3 isn't very standardized there are perhaps multiple "interpretations" of those data.

You can convert GFF3 + FASTA to gbk in the util menu. Currently its two steps.

funannotate util gff2tbl -g yourannot.gff3 -f genome.fasta > genome.tbl
funannotate util tbl2gbk -i genome.tbl -f genome.fasta -s "Species genus" --isolate XYHA

PlantDr430 commented 5 years ago

Antismash does take Gff3 + Genome file now. I looked over the resulting .gbk file from a Gff3+ Genome submission and compared it to a resulting .gbk from a .gbk submission and I didn't see any major differences. I will provide the result .gbk from my gff+genome submission later today. I will also try your method to first convert to .gbk using the funannotate utils in the meantime.

On Wed, Aug 21, 2019 at 8:14 AM Jon Palmer notifications@github.com wrote:

Does antismash take a GFF3 format now? I guess I'd be careful/skeptical that it will get converted to GBK in exactly the same way as funannotate does it -- because GFF3 isn't very standardized there are perhaps multiple "interpretations" of those data.

You can convert GFF3 + FASTA to gbk in the util menu. Currently its two steps.

funannotate util gff2tbl -g yourannot.gff3 -f genome.fasta > genome.tbl funannotate util tbl2gbk -i genome.tbl -f genome.fasta -s "Species genus" --isolate XYHA

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/318?email_source=notifications&email_token=AHB5CP2EE3DNFVGBLXFO6LDQFVEVJA5CNFSM4IN5S5JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4ZZY7A#issuecomment-523476092, or mute the thread https://github.com/notifications/unsubscribe-auth/AHB5CP653BKNWM5BYOWZWE3QFVEVJANCNFSM4IN5S5JA .

PlantDr430 commented 5 years ago

Oh man that numpy problem was a pain. Just become server admin and our server is a complete mess, my CS friend was appalled. Anyway, got numpy working so I can run some tests if you want.

Also, here is the Antismash results from the Gff3+Genome submission.

CPURP_antismash.gbk.gz

PlantDr430 commented 5 years ago

Alright, I did some testing. Might be a long-ish post.

So I ran 4 versions of antismash.gbk files to see if I could get past the antismash parser.

The first version was from an antismash run that used a .gbk file as input and contained carrots.

COMMENT     'Annotated using funannotate v1.5.1'.
            ##antiSMASH-Data-START##
            Version      :: 5.0.0
            Run date     :: 2019-06-29 16:58:09
            ##antiSMASH-Data-END##
FEATURES             Location/Qualifiers
     source          1..129268
                     /db_xref="taxon:83212"
                     /mol_type="genomic DNA"
                     /organism="Claviceps africana"
                     /strain="CCC489"
     mRNA            complement(join(<58..1256,1476..1642,1720..2899))
                     /locus_tag="E4U42_001851"
                     /product="hypothetical protein"
     gene            complement(<58..2899)
                     /locus_tag="E4U42_001851"
     CDS             complement(join(<58..1256,1476..1642,1720..2899))
                     /codon_start=1
                     /gene_functions="transport (smcogs) SMCOG1288:ABC
                     transporter related protein (Score: 134.9; E-value: 6e-41)"
                     /gene_kind="transport"
                     /locus_tag="E4U42_001851"
                     /product="hypothetical protein"
                     /protein_id="ncbi_E4U42_001851-T1"
                     /transl_table=1
                     /translation="MAAAQALTQILPQMIAVSKAMAAAQNLFSTIDRVSNMDTLSEDGI
                     EPADFQGHIRLQGVGFSYPARPNTPVLQDVNLEIRPNQVTAIVGASGSGKSTIFGLIER
                     WYAYSSGEMTLDGHRLESIKLRWLRTKIRLVQQEPTLFSGSIYQNVMDGLAGCDDGLSD
                     GEKKHRVVAACKAVLMHDFIAELPRGYDSCIGERGASLSGGQRQRLVIARAIVSDPKVL
                     LLDEATSALDAHAEKAVQAALNNIARGRTVVVIAHRLSTVRDSDNIIVLGKGGRVMESG
                     THARLVALGGAYASLARTQDLAENMPDPVEGEEGSVASGEEEERAVAAPDVDSAQTPTA
                     RRGSGSGSGKKGESRRHGTLSSYGLLHGLFLIIKEQRTLWRPLSVTLVCCTAGGLLSSS
                     MAVVVANSLEVYRGADFDKARFFAIMFFAIGLCSILVYATIGWISNVIAQTIIRFYRRD
                     ILDNTLRQDMAFFDRPENNTGALVARLASEPLSLQELLSFNVSLVVISIVNAVCGCTVA
                     VISGWKLGLAMCLGAMPVIVGAGYLRIRLEVRFEQDTARSFASSSAVAAEAVMGIRTVC
                     SLALEEAVVERYSQSLQDLVRDSIGGLGVKAFLYALSQSASLLVMGLGFWYGGRLVSTG
                     EYTLRQFYVVYMVVIYSGGATAALFQHTTSISKACTAINYILGLRQTRVLLDDDDAEED
                     EDHDPGAAVARPVDEKGPGLEAGLERVHFAYPLRPKQKVLRGIDMSIRPGQMTALVGAS
                     GCGKSTLIGLLERFYDPSSGTVWVRDDGRRRDIRTLHRRRHRRDVALVQQEPVLYQGSI
                     LDNVALGIEHDRLRPADPPEARIEAACRAAHIWDFIA"
     protocluster    <58..8796
                     /aStool="rule-based-clusters"

The second version, was version no.1 without carrots. So in brief:

mRNA            complement(join(58..1256,1476..1642,1720..2899))
                     /locus_tag="E4U42_001851"
                     /product="hypothetical protein"
     gene            complement(58..2899)
                     /locus_tag="E4U42_001851"
     CDS             complement(join(58..1256,1476..1642,1720..2899))
     protocluster    58..8796

The third version was from an antismash run that used a .gff3+.fasta file as input, with carrots.


COMMENT     ##antiSMASH-Data-START##
            Version      :: 5.0.0
            Run date     :: 2019-08-21 05:42:35
            ##antiSMASH-Data-END##
FEATURES             Location/Qualifiers
     CDS             complement(join(<3315..3760,3855..3897))
                     /Dbxref="NCBI_GP:CCE30170.1"
                     /ID="cds5101-mRNA"
                     /Name="CCE30170.1"
                     /Note="CP_04018.1"
                     /gbkey="CDS"
                     /gene="gene5161-mRNA"
                     /product="uncharacterized protein"
                     /protein_id="CCE30170.1"
                     /source="EMBL"
                     /transl_table=1
                     /translation="MARFRPSLVSEEVVGNDLATRQTTKPQSQPVFKLQTSLNGLQPRS
                     RRREVIPWLHAAVRAAAEPLNRFSGSELTGFESYELDTANPSPRRPPKRSPRRRLLTIW
                     GRRSRGTTNLEETEEHYRRPRRRQRRDSMVIPNTYEDYRLAGSSLQATTTNIEAGKVD"
     protocluster    <3315..71902

And the last version was no.3 without carrots. In brief:

 CDS             complement(join(3315..3760,3855..3897))
     protocluster    3315..71902

All of these were able to get past the antismash parser and continue onto the GenBank creation step.

nextgenusfs commented 4 years ago

I think the antismash v5 parsing is fixed, reopen if still issues.