ohnosequences / mg7

Configurable and scalable 16S metagenomics data analysis
https://goo.gl/y3rZFD
GNU Affero General Public License v3.0
3 stars 3 forks source link

Taxonomic name is always empty #28

Closed laughedelic closed 8 years ago

laughedelic commented 8 years ago

I need to check how it works with just the NCBITaxonomy Bio4j distribution.

laughedelic commented 8 years ago

Looking at the relevant Bio4j code, I don't see that the name property is set. And even in bio4j-titan the property type is not created.

Why the scientific name is empty as well, I don't understand.

@eparejatobes WDYT?

laughedelic commented 8 years ago

Solving in https://github.com/bio4j/bio4j/pull/103 and https://github.com/bio4j/bio4j-titan/pull/75

laughedelic commented 8 years ago

Reimported NCBITaxonomy (on m3.large):

Statistics for program ImportNCBITaxonomy:
Input file: nodes.dmp
There were 1426530 taxonomic units inserted.
The elapsed time was: 0h 7m 53s

And tested it on some random ids:

> import com.thinkaurelius.titan.core, com.bio4j.titan.model.ncbiTaxonomy._, com.bio4j.titan.util.DefaultTitanGraph
> val taxGraph: TitanNCBITaxonomyGraph = new TitanNCBITaxonomyGraph( new DefaultTitanGraph(core.TitanFactory.open("berkeleyje:/home/ec2-user/bio4j/") ) )
> def getName(id: Int): String = taxGraph.nCBITaxonIdIndex.getVertex(id.toString).get.name

> getName(2)
res2: String = Bacteria

> getName(54)
res3: String = Nannocystis exedens

> getName(132)
res4: String = Tuberoidobacter mutans

The database is uploaded to s3://resources.ohnosequences.com/16s/bio4j-taxonomy/ (the old one is in 16s/bio4j/)

eparejatobes commented 8 years ago

:clap: :clap: :clap:

laughedelic commented 8 years ago

:shipit:

Approved with PullApprove

marina-manrique commented 8 years ago

:tada: :dancer: