vrmarcelino / CCMetagen

Microbiome classification pipeline
GNU General Public License v3.0
64 stars 19 forks source link

ValueError for taxid when using custom database #53

Closed brambloemen closed 1 year ago

brambloemen commented 1 year ago

When using our custom, recently compiled KMA database, i get the following error when using CCmetagen.py (version 1.4.0):

Traceback (most recent call last): File "/usr/local/bin/lmod/ccmetagen/1.4.0/venv/bin/CCMetagen.py", line 274, in df = fParseKMA.populate_w_tax(df, ref_database, st, gt, ft, ot, ct, pt) File "/usr/local/bin/lmod/ccmetagen/1.4.0/venv/lib/python3.6/site-packages/ccmetagen/fParseKMA.py", line 104, in populate_w_tax match_info = fNCBItax.lineage_extractor(match_info.TaxId, match_info, taxfile) File "/usr/local/bin/lmod/ccmetagen/1.4.0/venv/lib/python3.6/site-packages/ccmetagen/fNCBItax.py", line 24, in lineage_extractor lineage = ncbi.get_lineage(query_taxid) File "/usr/local/bin/lmod/ccmetagen/1.4.0/venv/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 241, in get_lineage raise ValueError("%s taxid not found" %taxid) ValueError: 2990471 taxid not found

I've noticed that this concerns only recent additions to the taxonomic database. Removing them manually resolves the error, but I was wondering whether it is possible to handle the error inside the pipeline.

vrmarcelino commented 1 year ago

Hi, I believe this can be fixed by updating your ete3 database:

python
from ete3 import NCBITaxa
ncbi = NCBITaxa()
ncbi.update_taxonomy_database()
quit()

Could you give this a try? Thanks!

brambloemen commented 1 year ago

Thanks, this indeed resolved the issue!