merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
432 stars 145 forks source link

[BUG] anvi-split does not split item additional metadata #2196

Closed mschecht closed 9 months ago

mschecht commented 9 months ago

Short description of the problem

anvi-split does not split item additional metadata

anvi'o version

$ anvi-self-test --version
Anvi'o .......................................: marie (v8-dev)
Python .......................................: 3.10.13

Profile database .............................: 40
Contigs database .............................: 22
Pan database .................................: 17
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2

System info

macOS Ventura 13.2.1 (22D68)

Detailed description of the issue

Hey anvi'o team! I am using anvi-split to subset interactive interfaces but unfortunately the exported anvio artifacts do not contain any of the misc-data-items-txt from the input PROFILE.db. This leaves the user having to subset and re-import the misc-data-items-txt after anvi-split.

Files / commands to reproduce the issue

I can reproduce it with the infant gut tutorial

cd INFANT-GUT-TUTORIAL

anvi-script-merge-collections -c CONTIGS.db \
                              -i additional-files/external-binning-results/*.txt \
                              -o collections.tsv

anvi-import-misc-data collections.tsv \
                      -p PROFILE.db \
                      -t items

anvi-interactive -p PROFILE.db \
                 -c CONTIGS.db

Here we see misc data in the interactive inferace:

image

Now I will split off a bin and the misc data will disappear.

anvi-split -p PROFILE.db \
           -c CONTIGS.db  \
           --collection-name CONCOCT \
           --bin-id Bin_4 \
           -o Bin_4

anvi-interactive -p Bin_4/Bin_4/PROFILE.db \
                 -c Bin_4/Bin_4/CONTIGS.db

image

It would be wonderful if the misc data would also be subsetted and come with the new mini PROFILE.db :)

Thank you very much for looking into this!

meren commented 9 months ago

Thank you for reporting this, @mschecht. As you will see, the solution was quite straightforward thanks to the existing code that considered such needs in the future. The test case you have provided now yields the following for Bin 4:

image