Open wwood opened 7 months ago
There doesn't appear to be any *.sqldb available, now we should just use the taxonomy CSV?
There are some instructions above that need to be run - see "Let’s index the taxonomy database using SQLite, for faster access later on:".
sourmash tax prepare -t gtdb-rs207.taxonomy.csv \
-o gtdb-rs207.taxonomy.sqldb -F sql
That having been said, you can use the taxonomy CSV too! It'll just take longer to load each time.
"I only need to species reps" I think, so I'll just download the first one.
Right! It needs to match the content of the database you're searching, which (in this case) is all of the GTDB genomes, not just the species-level representatives. We'll fix the tutorial to make this clear!
The download link is in the tutorial, under "We also want to download the accompanying taxonomy spreadsheet:"
But that fails:
Well, and our error message certainly need some help... we'll fix, thanks!
I'm a bit confused why the species one has ident entries along the lines of
s__Escherichia_coli
whensketch
doesn't generate IDs of this type. Maybe I'm missing something.
Oh dear, that does look incorrect to me - I wonder why we did that... I'll see if I can fix. Thank you very much for reporting all of this!
Fixing link to species database here: https://github.com/sourmash-bio/sourmash/pull/3119
Thanks for the quick response @ctb - makes sense - fine by me to close this issue.
Hi there,
I've been having some trouble getting R207 databases to work with
soumash tax metagenome
. I'm using 4.8.8 from conda.After running sketch, the instructions at https://sourmash.readthedocs.io/en/latest/tutorial-lemonade.html#id7 say
There doesn't appear to be any *.sqldb available, now we should just use the taxonomy CSV?
OK, so
"I only need to species reps" I think, so I'll just download the first one. But that fails:
The genome one worked, so I got there in the end.
I'm a bit confused why the species one has ident entries along the lines of
s__Escherichia_coli
whensketch
doesn't generate IDs of this type. Maybe I'm missing something.Anyway, HTH, ben