Closed Somebodyatthdoor closed 1 year ago
Can you please share how your Species_gtdb_lineages.txt
file is formatted?
By default ncbi-gtdb_map.py
assumes that the column containing the queries (taxonomies) is the first column in the table.
Also, are you using --no-prefix
?
Hi, The file is a single column (no header), with the full gtdb classifications: Species_gtdb_lineages.txt
I have tried running the command both with and without the --no-prefix flag. But I always get no hits: 2023-02-09 08:58:23,147 - Loading: ar122_metadata_r95.tsv 2023-02-09 08:58:23,388 - Entries lacking an NCBI taxonomy: 153 2023-02-09 08:58:23,388 - Completeness-filtered entries: 1 2023-02-09 08:58:23,388 - Contamination-filtered entries: 69 2023-02-09 08:58:23,389 - Entries used: 2850 2023-02-09 08:58:23,389 - Loading: bac120_metadata_r95.tsv 2023-02-09 08:58:38,645 - Entries lacking an NCBI taxonomy: 0 2023-02-09 08:58:38,645 - Completeness-filtered entries: 17 2023-02-09 08:58:38,646 - Contamination-filtered entries: 2253 2023-02-09 08:58:38,646 - Entries used: 189257 2023-02-09 08:58:38,646 - Reading in queries: Species_gtdb_lineages.txt 2023-02-09 08:58:38,649 - No. of queries: 1790 2023-02-09 08:58:38,649 - No. of de-rep queries: 628 2023-02-09 08:58:38,649 - Batching queries... 2023-02-09 08:58:38,649 - No. of batches: 1 2023-02-09 08:58:38,650 - Queries per batch: 628 2023-02-09 08:58:38,650 - Querying taxonomies... 2023-02-09 08:58:38,652 - PID27975: Finished! Queries=628, Hits=0, No-Hits=628 2023-02-09 08:58:38,655 - File written: taxonomies/taxonomy_map_summary.tsv
Cheers, Laura
You need to use just one taxonomic level for the queries. See https://github.com/nick-youngblut/gtdb_to_taxdump/blob/master/tests/data/ncbi-gtdb/ncbi_tax_queries.txt for an example. I'll update the docs to clarify.
Brilliant, thanks that's solved it.
Hi,
I am having a problem converting GTDB taxonomies to NCBI taxonomies. When I run this command I only get NA results in my output file.
I first run wget to get the necessary GTDB files (I have to do this due to some security related issues with the server I am on):
wget https://data.ace.uq.edu.au/public/gtdb/data/releases/release95/95.0/ar122_metadata_r95.tar.gz wget https://data.ace.uq.edu.au/public/gtdb/data/releases/release95/95.0/bac120_metadata_r95.tar.gz tar -xvzf ar122_metadata_r95.tar.gz tar -xvzf bac120_metadata_r95.tar.gz ncbi-gtdb_map.py -q gtdb_taxonomy -o taxonomies Species_gtdb_lineages.txt ar122_metadata_r95.tsv bac120_metadata_r95.tsv
Input file: Species_gtdb_lineages.txt
Output file: taxonomy_map_summary.txt
The GTDB IDs that I have in my input file are present in the metadata files, so I am unsure what it is I am doing wrong.
Thank you very much for making this tool, Laura