merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
415 stars 142 forks source link

[FEATURE REQUEST] Option to automatically migrate dbs when running `anvi-export-contigs` #2085

Closed jolespin closed 1 year ago

jolespin commented 1 year ago

The need

Finally got anvi'o installed but now I'm seeing the following message when trying to export fasta:

Config Error: The database at 'CONTIGS.db' is outdated (this database is v9 and your anvi'o
              installation wants to work with v20). You can migrate your database without
              losing any data using the program `anvi-migrate` with either of the flags
              `--migrate-dbs-safely` or `--migrate-dbs-quickly`.

Now I must convert all of the dbs so I can use them.

Think this could be useful for lots of users.

The solution

usage: anvi-export-contigs [-h] -c CONTIGS_DB [--contigs-of-interest FILE]
                           [--splits-mode] -o FILE_PATH [--just-do-it]
                           [--no-wrap]

optional arguments
  -h, --help            show this help message and exit
  -c CONTIGS_DB, --contigs-db CONTIGS_DB
                        Anvi'o contigs database generated by 'anvi-gen-
                        contigs-database' (default: None)
  --contigs-of-interest FILE
                        It is possible to focus on only a set of contigs. If
                        you would like to do that and ignore the rest of the
                        contigs in your contigs database, use this parameter
                        with a flat file every line of which desribes a single
                        contig name. (default: None)
  --splits-mode         Export split sequences instead. (default: False)

  # Suggested options
  -a --automatically-migrate-db    Automatically migrate database versions to the current database (default: False)
  --update-existing-db    If selected, will update and overwrite the existing CONTIGS_DB (default: False)
  --tmp    If --update-existing-db False, directory to write the intermediate CONTIGS.db version (Default: $TMPDIR)

  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results. (default: None)
  --just-do-it          Don't bother me with questions or warnings, just do
                        it. (default: False)
  --no-wrap             Do not be wrap sequences nicely in the output file.
                        (default: False)

Beneficiaries

I would expect this to benefit anyone using Anvi'o and especially those who are using data generated from older versions of Anvi'o (such as the Delmont 2018 dataset).

jolespin commented 1 year ago

Also, I guess I'm the (un)lucky one...

I tried the migrate-dbs-quickly option

  --migrate-dbs-quickly
                        If you chose this, anvi'o will migrate your databases
                        in place. It will be much faster (and arguably more
                        fun) than the safe option, but if something goes
                        wrong, you will lose data. During the first five years
                        of anvi'o development not a single user lost data
                        using our migration scripts as far as we know. But
                        there is always a first, and today might be your lucky
                        day. (default: False)
anvi-migrate CONTIGS.db --migrate-dbs-quickly -t 20
Config Error: Anvi'o has very bad news for you :( Your migration failed, and anvi'o has no
              backups to restore your original database. The current database is likely in a
              broken state, and you will unlkely going to be able to use it. So anvi'o renamed
              it by adding a prefix '.broken' to its file name. We are very sorry for this
              error (and anvi'o will certainly not put salt on the wound by reminding you that
              you could have avoided it by using the `--migrate-dbs-safely` flag): "
              Config Error: There is an issue but it is easy to resolve and
              everything is fine! To continue,  please first install the Python module `h5py`
              by running `pip install   h5py==2.8.0` in your anvi'o environment. The reason
              why the standard anvi'o   package does not include this module is both
              complicated and really unimportant.  Re-running the migration after `h5py` is
              installed will make things go smootly.
meren commented 1 year ago

Running things automatically is not the anvi'o way. We believe it is much better if the user explicitly runs each step so they are in full control. It is somewhat inconvenient for new users, and we are sorry for that. It pays off in the long run.

But you don't really need to be a new user of anvi'o. You can move on with your life without having to deal with it at all! :)

Delmont et al MAGs are already available as FASTA files in a standalone data package. What you need is listed in the document referred from the data availability section of the manuscript:

https://merenlab.org/data/tara-oceans-mags/

Here is one of the items at the very top of it:

doi:10.6084/m9.figshare.4902923: FASTA files for 957 non-redundant metagenome-assembled genomes.

jolespin commented 1 year ago

Thanks for sending these over! I actually found these fasta files post hoc and should have updated my post. I didn't realize there were multiple figshare links until after.

I agree that it's good to be explicit and not overly automate too many things. I try to practice that philosophy as well and strive for modularity.

I've used anvio quite a bit in the past and I'm a big fan of the pangenome functionality.