merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
426 stars 145 forks source link

External Gene Calls Doesn't Read First Column #407

Closed TheOneHyer closed 7 years ago

TheOneHyer commented 7 years ago

The lab I work in uses software independent of anvi'o for gene calling and annotation. I made an external genes file but when I try to generate it with anvi-gen-contigs-database using the --external-gene-calls flag, I get the following error:

          The file 'anvio/QV11.mg.gene_locations.tsv' does not contain the right type of
          header. It was expected to have these: 'gene_callers_id, contig, start, stop, 
          direction, partial, source, version', however it had these: 'c_000000000001,  
          419, 781, f, 0, Prodigal, 2.6'

The first few lines of my external gene calls file are:

1 c_000000000001 419 781 f 0 Prodigal 2.6 2 c_000000000001 2808 3059 f 0 Prodigal 2.6 3 c_000000000001 3056 3310 f 0 Prodigal 2.6

As you can see, all eight columns are present but anvi'o appears to only attempt to import the last seven. I do not have a headers row as that throws an error as well claiming that "gene_callers_id" is not an int. In other words, anvi'o appears to use the first column to ensure the data types are valid but seems to ignore it when checking for field numbers.

I run anvi'o 2.02 as per PyPI.

meren commented 7 years ago

Hi Alex,

The first line of the external gene calls file should contain the headers mentioned in the error. Here is an example external gene calls file:

https://github.com/meren/anvio/blob/master/tests/sandbox/example_external_gene_calls.txt

Your example seem to be missing the very first header line. Please let me know if I'm wrong.

Best,