Closed jonathancrabtree closed 4 years ago
Came to post an issue but I think I'm having the same problem noted above, so will just add a concrete example of why this is a problem. I am trying to extract 16S sequences that are annotated in a GenBank file (example). The fact that a gene is the 16S sequence is identified by the product name in the GenBank file,
gene 517900..517988
/locus_tag="SAMN05444282_102329"
rRNA 517900..517988
/locus_tag="SAMN05444282_102329"
/product="16S ribosomal RNA . Bacterial SSU"
However, the product name doesn't make it into the GFF3 file and so it is impossible to select the 16S sequences downstream separately from other rRNA's,
FNQD01000002 GenBank gene 517900 517988 . + . ID=SAMN05444282_102329;locus_tag=SAMN05444282_102329
FNQD01000002 GenBank rRNA 517900 517988 . + . ID=SAMN05444282_102329.rRNA.1;Parent=SAMN05444282_102329
I'll see if I can get this added tonight.
Last night has shifted into today.
@mikemc Is it possible to attach your GBK file so I can test with it, or is it private?
@jorvis The example I gave is from this GenBank file
@mikemc - The current version of the code should fix your issue. The tRNAs now export with anticodon reported and rRNAs with product. I'm not closing this ticket yet, as what @jonathancrabtree reported is actually the reverse conversion, going from GFF3 -> GBK.
Closing. I've now confirmed retention of annotation of tRNAs and rRNAs from source Genbank Flat file, converted to GFF3, then converted back to Genbank.
Great, thanks @jorvis! I haven't had a chance to test yet but sounds like this covers my issue.
If convert_gff3_to_gbk.py finds a tRNA, rRNA, or other non protein-coding gene in the input GFF3 it will output the parent "gene" feature in the output GenBank file, but nothing else. Only protein-coding genes with an mRNA feature below the parent gene appear to be converted fully. It looks like biocodegenbank.print_biogene needs to be generalized to handle all gene types, or at least all those that currently have a corresponding representation in the biothings module.