IMG Parsing - Githubissues

Hey,

First off, thanks for developing this tool! I've been wanting to import IMG annotations into my anvi'o contigs databases but didn't have a method of doing so until now.

However, I have run into an issue when trying to use the IMG source option. I downloaded a test genome from my data a couple of weeks ago and it appears that there might have been a change in source column provided by IMG as it only contains "img_core_v400" as a source for every row. This is different from the IMG example you provide in the test directory where the source column contains e.g. "Prodigal v2.6.3" or "GeneMark.hmm-2 v1.05" as sources. This caused an error:

Traceback (most recent call last):
  File "gff_parser.py", line 64, in <module>
    source, version = feature.source.split(SEP, 1)
ValueError: not enough values to unpack (expected 2, got 1)

It appears it broke since the version and the source are separated by a '_' rather than ' '. As a temporary fix I went in and changed the code to match my specific use case to the following:

source = "_".join(feature.source.split("_")[:2]) # results in img_core
version = feature.source.split("_")[2] # results in v400

which led to source = "img_core" and version = "v400" for all my rows as expected.

After running this updated code it finished but gave the output message: Done. All 1699 have been processed succesfully. There were 0 coding sequences, 0 RNAs, and 0 unknown features.

I am unsure where this next issue might lie since I am certain that there should be plenty of complete coding sequences based on other analyses. I also am unsure whether my fix might have had additional consequences that I did not realize that might have disrupted something downstream to lead to this result.

Any suggestions on what I might be doing wrong would be greatly appreciated! Thanks! Oscar

karkman / gff_parser

IMG Parsing #12