mattb112885 / clusterDbAnalysis

ITEP - Integrated Toolkit for Exploration of microbial Pan-genomes
26 stars 15 forks source link

Using my own data from JGI #82

Closed marciomagrini closed 6 years ago

marciomagrini commented 6 years ago

Hi everyone!

My question here is very simple: can I use an annotated genome from JGI-IMG as input for my pangenome analysis? Although JGI now provides .gbk files for output, it is still an issue, since my genomes are not published on NCBI yet. Thus when I use the convertGenbank2table.py I get an error regarding the TaxID verification, so I can't go on.

Thank you in advance!

JosephRyanPeterson commented 6 years ago

Hi Marco,

ITEP uses the taxon ID to generate the gene aliases within the database, so it is required. To identify the taxon ID it first searches NCBI for the GenInfo Identifier (gi). If it fails, it searches for the "taxon" in the "db_xref" tag of the source.

You can trick ITEP into using your own taxon ID by adding a section to your "source" that looks like this:

  source          1..1664970
                  /organism="Methanocaldococcus jannaschii DSM 2661"
                  /mol_type="genomic DNA"
                  /strain="DSM 2661"
                  /db_xref="taxon:243232"

You will want to make sure that the taxon ID is unique to your organism (or at least among the organisms that you are examining).

Hope that helps, and let us know if that works for you or if you need additional help.

JosephRyanPeterson commented 6 years ago

And by source, I mean the "source" section of your genbank file.

marciomagrini commented 6 years ago

Hi Joseph,

It works perfectly! Thank you so much for your help!!

All the best!