Closed meren closed 5 months ago
Ran this to confirm metagenome-mode
is working correctly with this commit:
cd INFANT-GUT-TUTORIAL/additional-files/pangenomics
# Test on single contigs-db - This should throw an error because they don't have the new data
$ anvi-estimate-scg-taxonomy -c external-genomes/Enterococcus_faecalis_6255.db --metagenome-mode --scg-name-for-metagenome-mode Ribosomal_S15 -o asdf
Config Error: The SCG taxonomy database version on your computer (GTDB: v214.1; Anvi'o: v1) is
different than the SCG taxonomy database version to populate your contigs
database (v95). Please re-run the program `anvi-run-scg-taxonomy` on your
contigs-db.
# Test is on multiple contigs-dbs
head -n 4 external-genomes.txt > external-genomes-small.txt
# This should throw an error
$ anvi-estimate-scg-taxonomy -M external-genomes-small.txt --metagenome-mode --scg-name-for-metagenome-mode Ribosomal_S15 -O asdf
Config Error: The SCG taxonomy database version on your computer (GTDB: v214.1; Anvi'o: v1) is
different than the SCG taxonomy database version to populate your contigs
database (v95). Please re-run the program `anvi-run-scg-taxonomy` on your
contigs-db.
# update one of the contigs-dbs and see what happens
anvi-run-scg-taxonomy -c external-genomes/Enterococcus_faecalis_6240.db -T 5
# Still catches the error woohoo! :)
$ anvi-estimate-scg-taxonomy -M external-genomes-small.txt --metagenome-mode --scg-name-for-metagenome-mode Ribosomal_S15 -O asdf
Config Error: The SCG taxonomy database version on your computer (GTDB: v214.1; Anvi'o: v1) is
different than the SCG taxonomy database version to populate your contigs
database (v95). Please re-run the program `anvi-run-scg-taxonomy` on your
contigs-db.
# Update them all and see what happens
for genome in ` tail -n +2 external-genomes-small.txt | cut -f 2`; do anvi-run-scg-taxonomy -c $genome -T 6; done
$ anvi-estimate-scg-taxonomy -M external-genomes-small.txt --metagenome-mode --scg-name-for-metagenome-mode Ribosomal_S15 -O asdf
Num metagenomes ..............................: 3
Taxonomic level of interest ..................: (None specified by the user, so 'all levels')
Output file prefix ...........................: asdf
Output in matrix format ......................: False
Output raw data ..............................: False
SCG coverages will be computed? ..............: False
SCG [chosen by the user] .....................: Ribosomal_S15
* Your metagenome file DOES NOT contain profile databases, but you asked anvi'o to
estimate SCG taxonomy in metagenome mode. So be it. SCG name is set to
Ribosomal_S15.
Long-format output ...........................: asdf-LONG-FORMAT.txt
# and it works!
I also tested on the Infant Gut Dataset (the main assembly in metagenome mode), as well as one of my single contigs db test files, and can confirm it works perfectly :)
Thank you for catching that bug, @mschecht, and thank you for testing it further, @ivagljiva.
I am merging it now and we will deal with the fallout in master
:p
A friendly user on Discord identified a bug with this PR that I have also confirmed in my installation.
If you run anvi-setup-scg-taxonomy --reset
, the program deletes the directory which contains the new SCG search databases:
if os.path.exists(self.ctx.SCGs_taxonomy_data_dir):
if self.reset:
shutil.rmtree(self.ctx.SCGs_taxonomy_data_dir)
self.run.warning('The existing directory for SCG taxonomy data dir has been removed. Just so you know.')
filesnpaths.gen_output_directory(self.ctx.SCGs_taxonomy_data_dir)
It used to be that we used --reset
to download sequences directly from GTDB, but now the search databases ship with anvi'o, so we don't need the --reset
functionality at all anymore. In fact, there were only three references to the self.reset
variable remaining in the scg.py
code: 1) reading the argument, 2) a sanity check against using both --reset
and --redo-databases
, and 3) the above code for deleting the directory.
I think if we remove the --reset
option entirely, this will be resolved. @meren, is there any reason to keep --reset
? I couldn't find any other classes being called that might use this parameter, but I may have missed something.
If so, I have a commit ready to go to fix this bug :)
See 4eab829195700e3c4fe6964b876151a778b33344 and 06a5769a0478ef8e7003c073daaa36660039fd75 for the fixes
Furthermore, it seems the parameter --redo-databases
is no longer used. So I got rid of it too :). (commit 2c1a6af3bf22d2199ae18e835058af7211fef7b0 )
(the additional commits are now merged to master
as of 71818e1bc57d3d810604012df9aabe7111b02597 )
Not only updating the databases, but also the way we're dealing with them.
More about this in #2211.