merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
413 stars 142 forks source link

Fix metagenome mode with collection focus #2242

Closed ivagljiva closed 3 months ago

ivagljiva commented 3 months ago

This PR addresses issue #2238. Now, when users provide a collection name to anvi-estimate-metabolism, we only utilize gene calls that belong to the splits in the collection. And I also updated the Mode output when users provide both --metagenome-mode and a collection, so now it shows this:

Mode (what we are estimating metabolism for) .: Individual contigs within a collection in a metagenome

I tested it using the Infant Gut dataset and a collection including only a few splits:

head -n 30 additional-files/collections/merens.txt > meren_partial_collection.txt
anvi-import-collection -p PROFILE.db -C MEREN meren_partial_collection.txt -c CONTIGS.db
anvi-estimate-metabolism -c CONTIGS.db -p PROFILE.db -C MEREN --add-coverage -O test_coverage --metagenome-mode

It shows the following new warning when reducing the number of relevant splits to use, and it works with only 181 gene calls that belong to those splits:

WARNING
===============================================
Since a collection name was provided, we will only work with gene calls from the
subset of 30 splits in the collection for the purposes of estimating metabolism.

Gene calls from these sources ................: 181 found
* Since the --add-coverage flag was provided, we are now loading the relevant
  coverage information from the provided profile database.

WARNING
===============================================
A subset of splits (30 of 4784, to be precise) are requested to initiate gene-
level coverage stats for. No need to worry, this is just a warning in case you
are as obsessed as wanting to know everything there is to know.

I also tested these cases:

 anvi-estimate-metabolism -c CONTIGS.db -p PROFILE.db -C E_faecalis -O test_collection
anvi-estimate-metabolism -c CONTIGS.db -p PROFILE.db -C E_faecalis --add-coverage -O test_collection_w_cov

And I also tested it with anvi-self-test --suite metabolism -T 6 to make sure nothing else broke from these changes (and everything was fine) :)

meren commented 3 months ago

🎉