merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
423 stars 144 forks source link

anvi-merge profile.db for many config.db #2117

Closed natchaphon602 closed 11 months ago

natchaphon602 commented 11 months ago

I try to merge profile.db for each MAG, but then I found an error about config.db

Code: anvi-merge bam/*/PROFILE.db -o profile_merge -c contigdb/

Config Error: Someone downstream doesn't like your so called database, 'contigdb/'. They say " Config Error: This one time someone was not happy with 'contigdb/'
and 'unable to open database file', they said. ". Awkward :(

So I have questions;

ivagljiva commented 11 months ago

Hi @natchaphon602

In the error you showed, contigdb/ appears to be a directory. You need to pass a .db file to the -c flag for this to work.

Now for your questions.

Is it possible to merge multiple config.db together to perform anvi-merge for merge multiple profile.db?

No. Only profile databases can be merged, and only those profile databases whose data corresponds to the same contigs database. For example, you map multiple samples to one reference genome, then you can merge the profile databases for those samples, and the final merged profile will correspond to the single contigs database containing the reference genome.

how should I proceed to get both merge .db file for anvio interactive input?

Re-run the anvi-merge command, but make sure you give it a single contigs database and only the profiles corresponding to that contigs database. If you have multiple references (ie, multiple contigs databases), you will have to run this command several times, once per contigs db. Then you can run anvi-interactive to visualize each reference with its mapping data separately.

I hope this helps.

natchaphon602 commented 11 months ago

From your answer, it's mean that I cannot run anti-interactive to visualize all MAG at the same time? How should I do to produce it all at same time?

ivagljiva commented 11 months ago

Well, if you want to visualize all of them at the same time, you can. But at this point that would mean that you have to redo your mapping workflow, because the sequences in your contigs database need to match the sequences in your reference FASTA file used for mapping (otherwise we cannot reconcile them with the BAM output in the profile database).

These are the steps you would have to take to visualize all your MAGs with mapping data in anvi-interactive simultaneously:

  1. Concatenate the MAG fasta files into one fasta that contains all of them (ie, something like cat *.fasta > all_MAGs.fasta)
  2. Run the read recruitment step from each sample to that reference file
  3. Create a contigs database out of your reference file
  4. Run anvi-profile on each BAM file into a separate profile database associated with the contigs database
  5. anvi-merge all profile databases into one (still associated with the single contigs database containing all of your MAGs)
  6. open the interactive interface with the contigs database and merged profile

If you were to follow https://merenlab.org/2018/07/09/anvio-snakemake-workflows/#metagenomics-workflow to run the mapping using anvi-run-workflow, then snakemake would automatically take care of steps 2-5 for you.