merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
413 stars 142 forks source link

[BUG] renaming profiles with anvi-db-info only does a partial job #2265

Open xvazquezc opened 1 month ago

xvazquezc commented 1 month ago

Short description of the problem

I was trying to rename a PROFILE.db based on #946, i.e.:

anvi-db-info --self-key sample_id --self-value glu_2 glu2/AUXILIARY-DATA.db --just-do-it

and indeed the self table was changed, but the sample name also appears as layer or sample_id in other tables of the PROFILE.db , but they aren't modified, e.g., all the mean_coverage_* tables. Because of this, when trying to run anvi-merge, it complains:

Config Error: The incoming layer orders data for std_coverage include layer names that do not
              match the ones in the database :/ Here they are: 'glu2'                        

(glu2 is the old sample name).

No idea if there is some new command to do this that I'm not aware...

anvi'o version

8-dev up to date

$ anvi-self-test --version
Anvi'o .......................................: marie (v8-dev)
Python .......................................: 3.10.12

Profile database .............................: 40
Contigs database .............................: 23
Pan database .................................: 17
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2

System info

Ubuntu 22.04, up to date.

meren commented 1 month ago

Hey @xvazquezc, renaming sample names once the single or merged profiles are generated is an absolute pain. We don't have a script for that, unfortunately. But there is a way to do it using SQLite via the command line. You can dump the entire content of the db file as flat text, literally search/replace the sample name, and generate a new db from the SQL file. Please let me know if you end up going this direction and it works.

Since the contigs-db files don't care about sample names, this change should not effect anything else downstream and work smoothly in theory.