merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
426 stars 145 forks source link

[BUG] migration of profile.db from v33 to v35 #1632

Open efogarty11 opened 3 years ago

efogarty11 commented 3 years ago

Short description of the problem

When I try to migrate a profile database from v33 to v35 it appears to work but I get a config error when I try to use the profile for anything else.

anvi'o version

anvio main on midway

Detailed description of the issue

I ran this command to migrate the profile:

anvi-migrate 06_MERGED_REPROFILE/PROFILE.db --migrate-dbs-safely

and it looks like it migrated without a problem:

Database Path ................................: 06_MERGED_REPROFILE/PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 33
Target Version ...............................: 35
Migration mode ...............................: Safe
SQLite Version ...............................: 3.31.1
* Your profile db is now version 34. 8 of its tables were cleaned from a
historical design artifact.
* The profile database is now 35. This upgrade redefined the stored format of
INDELS to provide a more robust working framework. If you are upgrading this
database from `v6`, you don't have anything to worry about. But if you were
using the active branch of anvi'o, then you lost your INDELs now and you would
need to re-profile your BAM files if you want them back :)

But when I try to run anvi-interactive (or any other program) with that profile.db, it gives this error:

Config Error: The database at 06_MERGED_REPROFILE/PROFILE.db does seem to have a table
              `mean_coverage_Q2Q3_splits` :/ Here is a list of table names this database
              knows: self, item_orders, layer_orders, views, collections_info, states,
              item_additional_data, layer_additional_data, variable_codons,
              variable_nucleotides, collections_bins_info, collections_of_contigs,
              collections_of_splits, indels

The file to the database I tried to migrate is in a comment on slack. Note that it's quite a large file so it might not be the best thing for testing.

Files to reproduce

path is on slack :)

meren commented 3 years ago

Ok. I did a very quick test with freshly downloaded Infant gut dataset to make sure this is not impacting databases generated during v6.2.

Here is my tests for @ivagljiva to also take a look:

First download the data and save a copy of the original PROFILE.db:

wget https://ndownloader.figshare.com/files/18046139 -O INFANT-GUT-TUTORIAL.tar.gz
tar -zxvf INFANT-GUT-TUTORIAL.tar.gz && cd INFANT-GUT-TUTORIAL
anvi-db-info PROFILE.db

DB Info (no touch)
===============================================
Database Path ................................: PROFILE.db
Description ..................................: _No description is provided_
Type .........................................: profile
Version ......................................: 31

DB Info (no touch also)
===============================================
anvio ........................................: 4-master
sample_id ....................................: Infant Gut Time Series by Sharon et al
samples ......................................: DAY_15A, DAY_15B, DAY_16, DAY_17A, DAY_17B, DAY_18, DAY_19, DAY_22A, DAY_22B, DAY_23, DAY_24
total_reads_mapped ...........................: 9652871, 16377263, 20110402, 4066300, 26898531, 21568567, 22310834, 24337785, 14050898, 11740100, 20700333
merged .......................................: 1
blank ........................................: 0
default_view .................................: mean_coverage
min_contig_length ............................: 1000
SNVs_profiled ................................: 1
SCVs_profiled ................................: 1
num_contigs ..................................: 4189
num_splits ...................................: 4784
total_length .................................: 35766167
min_coverage_for_variability .................: 10
report_variability_full ......................: 0
contigs_db_hash ..............................: d51abf0a
creation_date ................................: 1529504084.32051
available_item_orders ........................: tnf:euclidean:ward,cov:euclidean:ward,tnf-cov:euclidean:ward
default_item_order ...........................: tnf-cov:euclidean:ward
max_contig_length ............................: 9223372036854775807
items_ordered ................................: 1

* Please remember that it is never a good idea to change these values. But in some
cases it may be absolutely necessary to update something here, and a programmer
may ask you to run this program and do it. But even then, you should be
extremely careful.
cp PROFILE.db PROFILE.db.31

This is my version:

 :: anvi'o v7 ::  ~ >>> anvi-interactive -v
Anvi'o .......................................: hope (v7)

Profile database .............................: 35
Contigs database .............................: 20
Pan database .................................: 14
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 2
tRNA-seq database ............................: 1

Quick migration step by step:

for v in 32 33 34 35
do
    anvi-migrate PROFILE.db --migrate-dbs-quickly -t $v
    sqlite3 PROFILE.db '.tables' | grep Q2Q3
done

Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 31
Target Version ...............................: 32
Migration mode ...............................: Adventurous

SQLite Version ...............................: 3.33.0

* Your profile db is now 32. We just added a bunch of new variables to the `self`
table of your database. All good now.

abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits

Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 32
Target Version ...............................: 33
Migration mode ...............................: Adventurous

SQLite Version ...............................: 3.33.0

* Your profile db is now 33. This update renamed two column names in the
`variabile_nucleotides` table of your profile database (`in_partial_gene_call`
has become `in_noncoding_gene_call`, and `in_complete_gene_call` has become
`in_complete_gene_call`

abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits

Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 33
Target Version ...............................: 34
Migration mode ...............................: Adventurous

SQLite Version ...............................: 3.33.0

* Your profile db is now version 34. 8 of its tables were cleaned from a
historical design artifact.

abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits
Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 34
Target Version ...............................: 35
Migration mode ...............................: Adventurous

SQLite Version ...............................: 3.33.0

* The profile database is now 35. This upgrade redefined the stored format of
INDELS to provide a more robust working framework. If you are upgrading this
database from `v6`, you don't have anything to worry about. But if you were
using the active branch of anvi'o, then you lost your INDELs now and you would
need to re-profile your BAM files if you want them back :)

abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits

Final db info:

anvi-db-info PROFILE.db

DB Info (no touch)
===============================================
Database Path ................................: PROFILE.db
Description ..................................: _No description is provided_
Type .........................................: profile
Version ......................................: 35

DB Info (no touch also)
===============================================
anvio ........................................: 4-master
sample_id ....................................: Infant Gut Time Series by Sharon et al
samples ......................................: DAY_15A, DAY_15B, DAY_16, DAY_17A, DAY_17B, DAY_18, DAY_19, DAY_22A, DAY_22B, DAY_23, DAY_24
total_reads_mapped ...........................: 9652871, 16377263, 20110402, 4066300, 26898531, 21568567, 22310834, 24337785, 14050898, 11740100, 20700333
merged .......................................: 1
blank ........................................: 0
default_view .................................: mean_coverage
min_contig_length ............................: 1000
SNVs_profiled ................................: 1
SCVs_profiled ................................: 1
num_contigs ..................................: 4189
num_splits ...................................: 4784
total_length .................................: 35766167
min_coverage_for_variability .................: 10
report_variability_full ......................: 0
contigs_db_hash ..............................: d51abf0a
creation_date ................................: 1529504084.32051
available_item_orders ........................: tnf:euclidean:ward,cov:euclidean:ward,tnf-cov:euclidean:ward
default_item_order ...........................: tnf-cov:euclidean:ward
max_contig_length ............................: 9223372036854775807
items_ordered ................................: 1
min_percent_identity .........................: 0
min_indel_fraction ...........................: 0.0
INDELs_profiled ..............................: 0

All good.

Reset db,

cp PROFILE.db.31 PROFILE.db

Safe migration step by step:

for v in 32 33 34 35
do
    anvi-migrate PROFILE.db --migrate-dbs-safely -t $v
    sqlite3 PROFILE.db '.tables' | grep Q2Q3
done

Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 31
Target Version ...............................: 32
Migration mode ...............................: Safe

SQLite Version ...............................: 3.33.0

* Your profile db is now 32. We just added a bunch of new variables to the `self`
table of your database. All good now.

abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits

Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 32
Target Version ...............................: 33
Migration mode ...............................: Safe

SQLite Version ...............................: 3.33.0

* Your profile db is now 33. This update renamed two column names in the
`variabile_nucleotides` table of your profile database (`in_partial_gene_call`
has become `in_noncoding_gene_call`, and `in_complete_gene_call` has become
`in_complete_gene_call`

abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits

Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 33
Target Version ...............................: 34
Migration mode ...............................: Safe

SQLite Version ...............................: 3.33.0

* Your profile db is now version 34. 8 of its tables were cleaned from a
historical design artifact.

abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits

Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 34
Target Version ...............................: 35
Migration mode ...............................: Safe

SQLite Version ...............................: 3.33.0

* The profile database is now 35. This upgrade redefined the stored format of
INDELS to provide a more robust working framework. If you are upgrading this
database from `v6`, you don't have anything to worry about. But if you were
using the active branch of anvi'o, then you lost your INDELs now and you would
need to re-profile your BAM files if you want them back :)

abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits

In neither, mean_coverage_Q2Q3_* tables are lost.

So this looks good. The same result using the main branch. @ivagljiva, how exactly did you lose mean_coverage_Q2Q3_* with IGD?

meren commented 3 years ago

Note, my sqlite version is the following:

sqlite3 --version
3.33.0 2020-08-14 13:23:32 fca8dc8b578f215a969cd899336378966156154710873e68b3d9ac5881b0alt2
meren commented 3 years ago

@ivagljiva, regarding the following comment on anvi'o slack:

I can confirm that this is due to the migration. I tested this with the infant gut dataset (downloaded it, migrated to v33, checked that it has the mean coverage table - it did - and then ran migration to v35. the table was gone.)

Did you test this with IGD on the same system @efogarty11 tested it where sqlite version seems to be 3.31.1?

Thank you both,

ivagljiva commented 3 years ago

@meren I tested it on my laptop, where I have the same sqlite version as @efogarty11:

$ sqlite3 --version
3.31.1 2020-01-27 19:55:54 3bfa9cc97da10598521b342961df8f5f68c7388fa117345eeb516eaa837bb4d6

However, I am also on the development branch so that could play a role as well:

anvi-interactive -v
Anvi'o .......................................: hope (v7-dev)

Profile database .............................: 35
Contigs database .............................: 20
Pan database .................................: 14
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 2
tRNA-seq database ............................: 1

For the record, to replicate the error I did this:

tar -zxvf INFANT-GUT-TUTORIAL.tar.gz && cd INFANT-GUT-TUTORIAL
anvi-migrate -t 33 PROFILE.db --migrate-dbs-safely # go to v33
anvi-migrate -t 34 PROFILE.db --migrate-dbs-safely # go to v34
# check db in SQlite Browser to see Q2Q3 table - it is there
anvi-migrate PROFILE.db --migrate-dbs-safely # go to v35
# check db in SQlite Browser to see Q2Q3 table - it is there
cd ..
rm -r INFANT-GUT-TUTORIAL
tar -zxvf INFANT-GUT-TUTORIAL.tar.gz && cd INFANT-GUT-TUTORIAL
anvi-migrate -t 33 PROFILE.db --migrate-dbs-safely # go to v33
# check db in SQlite Browser to see Q2Q3 table - it is there
anvi-migrate PROFILE.db --migrate-dbs-safely # go to v35
# check db in SQlite Browser to see Q2Q3 table - it is gone

But just to triple check, I just repeated what you did above, and spoilers, this time there were no issues:

tar -zxvf INFANT-GUT-TUTORIAL.tar.gz && cd INFANT-GUT-TUTORIAL
anvi-db-info PROFILE.db
DB Info (no touch)
===============================================
Database Path ................................: PROFILE.db
Description ..................................: _No description is provided_
Type .........................................: profile
Version ......................................: 31

DB Info (no touch also)
===============================================
anvio ........................................: 4-master
sample_id ....................................: Infant Gut Time Series by Sharon et al
samples ......................................: DAY_15A, DAY_15B, DAY_16, DAY_17A, DAY_17B, DAY_18, DAY_19, DAY_22A, DAY_22B, DAY_23, DAY_24
total_reads_mapped ...........................: 9652871, 16377263, 20110402, 4066300, 26898531, 21568567, 22310834, 24337785, 14050898, 11740100, 20700333
merged .......................................: 1
blank ........................................: 0
default_view .................................: mean_coverage
min_contig_length ............................: 1000
SNVs_profiled ................................: 1
SCVs_profiled ................................: 1
num_contigs ..................................: 4189
num_splits ...................................: 4784
total_length .................................: 35766167
min_coverage_for_variability .................: 10
report_variability_full ......................: 0
contigs_db_hash ..............................: d51abf0a
creation_date ................................: 1529504084.32051
available_item_orders ........................: tnf:euclidean:ward,cov:euclidean:ward,tnf-cov:euclidean:ward
default_item_order ...........................: tnf-cov:euclidean:ward
max_contig_length ............................: 9223372036854775807
items_ordered ................................: 1

* Please remember that it is never a good idea to change these values. But in some
cases it may be absolutely necessary to update something here, and a programmer
may ask you to run this program and do it. But even then, you should be
extremely careful.

cp PROFILE.db PROFILE.db.31

## step by step quick
for v in 32 33 34 35
do
    anvi-migrate PROFILE.db --migrate-dbs-quickly -t $v
    sqlite3 PROFILE.db '.tables' | grep Q2Q3
done

Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 31
Target Version ...............................: 32
Migration mode ...............................: Adventurous

SQLite Version ...............................: 3.31.1

* Your profile db is now 32. We just added a bunch of new variables to the `self`
table of your database. All good now.

abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits
Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 32
Target Version ...............................: 33
Migration mode ...............................: Adventurous

SQLite Version ...............................: 3.31.1

* Your profile db is now 33. This update renamed two column names in the
`variabile_nucleotides` table of your profile database (`in_partial_gene_call`
has become `in_noncoding_gene_call`, and `in_complete_gene_call` has become
`in_complete_gene_call`

abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits
Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 33
Target Version ...............................: 34
Migration mode ...............................: Adventurous

SQLite Version ...............................: 3.31.1

* Your profile db is now version 34. 8 of its tables were cleaned from a
historical design artifact.

abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits
Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 34
Target Version ...............................: 35
Migration mode ...............................: Adventurous

SQLite Version ...............................: 3.31.1

* The profile database is now 35. This upgrade redefined the stored format of
INDELS to provide a more robust working framework. If you are upgrading this
database from `v6`, you don't have anything to worry about. But if you were
using the active branch of anvi'o, then you lost your INDELs now and you would
need to re-profile your BAM files if you want them back :)

abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits

## that looks good

## step by step safe
cp PROFILE.db.31 PROFILE.db
for v in 32 33 34 35
do
    anvi-migrate PROFILE.db --migrate-dbs-safely -t $v
    sqlite3 PROFILE.db '.tables' | grep Q2Q3
done

Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 31
Target Version ...............................: 32
Migration mode ...............................: Safe

SQLite Version ...............................: 3.31.1

* Your profile db is now 32. We just added a bunch of new variables to the `self`
table of your database. All good now.

abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits
Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 32
Target Version ...............................: 33
Migration mode ...............................: Safe

SQLite Version ...............................: 3.31.1

* Your profile db is now 33. This update renamed two column names in the
`variabile_nucleotides` table of your profile database (`in_partial_gene_call`
has become `in_noncoding_gene_call`, and `in_complete_gene_call` has become
`in_complete_gene_call`

abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits
Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 33
Target Version ...............................: 34
Migration mode ...............................: Safe

SQLite Version ...............................: 3.31.1

* Your profile db is now version 34. 8 of its tables were cleaned from a
historical design artifact.

abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits
Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 34
Target Version ...............................: 35
Migration mode ...............................: Safe

SQLite Version ...............................: 3.31.1

* The profile database is now 35. This upgrade redefined the stored format of
INDELS to provide a more robust working framework. If you are upgrading this
database from `v6`, you don't have anything to worry about. But if you were
using the active branch of anvi'o, then you lost your INDELs now and you would
need to re-profile your BAM files if you want them back :)

abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits

## also good

So yes, the step-by-step migration works.

The part that was removing the table for me earlier was migrating in one shot (from v33 to v35. I tried this again:

# first try from v31 straight to v35
cp PROFILE.db.31 PROFILE.db

anvi-migrate PROFILE.db --migrate-dbs-safely
Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 31
Target Version ...............................: 35
Migration mode ...............................: Safe

SQLite Version ...............................: 3.31.1

* Your profile db is now 32. We just added a bunch of new variables to the `self`
table of your database. All good now.

* Your profile db is now 33. This update renamed two column names in the
`variabile_nucleotides` table of your profile database (`in_partial_gene_call`
has become `in_noncoding_gene_call`, and `in_complete_gene_call` has become
`in_complete_gene_call`

* Your profile db is now version 34. 8 of its tables were cleaned from a
historical design artifact.

* The profile database is now 35. This upgrade redefined the stored format of
INDELS to provide a more robust working framework. If you are upgrading this
database from `v6`, you don't have anything to worry about. But if you were
using the active branch of anvi'o, then you lost your INDELs now and you would
need to re-profile your BAM files if you want them back :)

sqlite3 PROFILE.db '.tables' | grep Q2Q3
abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits

## good

# then try to v31 to v33, then v33 straight to v35
cp PROFILE.db.31 PROFILE.db

anvi-migrate PROFILE.db --migrate-dbs-safely -t 33
Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 31
Target Version ...............................: 33
Migration mode ...............................: Safe

SQLite Version ...............................: 3.31.1

* Your profile db is now 32. We just added a bunch of new variables to the `self`
table of your database. All good now.

* Your profile db is now 33. This update renamed two column names in the
`variabile_nucleotides` table of your profile database (`in_partial_gene_call`
has become `in_noncoding_gene_call`, and `in_complete_gene_call` has become
`in_complete_gene_call`

sqlite3 PROFILE.db '.tables' | grep Q2Q3
abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits

anvi-migrate PROFILE.db --migrate-dbs-safely
Database Path ................................: PROFILE.db
Detected Type ................................: profile
Current Version ..............................: 33
Target Version ...............................: 35
Migration mode ...............................: Safe

SQLite Version ...............................: 3.31.1

* Your profile db is now version 34. 8 of its tables were cleaned from a
historical design artifact.

* The profile database is now 35. This upgrade redefined the stored format of
INDELS to provide a more robust working framework. If you are upgrading this
database from `v6`, you don't have anything to worry about. But if you were
using the active branch of anvi'o, then you lost your INDELs now and you would
need to re-profile your BAM files if you want them back :)

sqlite3 PROFILE.db '.tables' | grep Q2Q3
abundance_contigs             mean_coverage_Q2Q3_contigs
abundance_splits              mean_coverage_Q2Q3_splits

## also good. whaaaat?

I am baffled. No idea why I was able to replicate the issue previously but not now :(

meren commented 3 years ago

@ivagljiva, do you think @efogarty11's attempt to migrate may have failed due to an improperly closed database file? It is a very large db, and if the next process started before it was written to disk fully (which is what happens when dbs are not closed properly at the end of the process) that may have had an influence.

But I still don't understand how, if it were to be the case, you could reproduce this error with IGD.

I am confuse.

ivagljiva commented 3 years ago

@meren, that was my only guess as to what had happened. We have all the proper calls to disconnect() as far as I can see in the migrate scripts, maybe that process could take a long time on large databases? such that it was not fully closed by the time the next migrate script opened it again? Do you think that is a possibility?

I am also very confused about how I reproduced it with IGD (twice! I reproduced it twice before commenting on the Slack because I wanted to be sure it was really happening), and even more confused that I later could not reproduce it again. 😢 It makes me feel a bit crazy.

I am currently updating my local anvi'o environments, and was planning to try again afterwards to see if anything changes.

meren commented 3 years ago

We have all the proper calls to disconnect() as far as I can see in the migrate scripts, maybe that process could take a long time on large databases? such that it was not fully closed by the time the next migrate script opened it again? Do you think that is a possibility?

This should never happen since those are not asynchronous functions and they 'wait' until the process is complete.

But this,

I am also very confused about how I reproduced it with IGD (twice! I reproduced it twice before commenting on the Slack because I wanted to be sure it was really happening), and even more confused that I later could not reproduce it again.

sounds like a race condition. Your I/O pressure is high or your CPU is busy? You see the problem. And you don't in other cases. This message you shred on Slack also points towards an evil of that sorts:

I tried again with the infant gut dataset, this time migrating from v33 to v34 (table was still there), and then migrating from v34 to v35. And the table was there. So for whatever reason, this is only happening if you migrate from v33 to v35 in one go.

When you do step by step, it is more likely to not see it. When you do it at one go, there is much little time between independent processes.

Luckily we have the IGD data archived from v6.2, and we can do something like this:

Set the stage:

wget https://ndownloader.figshare.com/files/18046139 -O INFANT-GUT-TUTORIAL.tar.gz
tar -zxvf INFANT-GUT-TUTORIAL.tar.gz && cd INFANT-GUT-TUTORIAL
cp PROFILE.db PROFILE.db.v31

Then migrate it 100 times,

for test in {1..100}
do
    cp PROFILE.db.v31 PROFILE.db
    anvi-migrate PROFILE.db --migrate-dbs-safely --quiet

    if [ `sqlite3 PROFILE.db '.tables' | wc -l` == "15" ]
    then
        echo "TEST #$test: PASS"
    else
        echo "TEST #$test: FAIL :("
    fi

    sha1sum PROFILE.db
done
meren commented 3 years ago

We can do this with relatively larger profile databases, such as this one that is 814 Mb.

Set the stage:

wget https://ndownloader.figshare.com/files/22467302 -O P-A-F.tar.gz
tar -zxvf P-A-F.tar.gz && cd P-A-F
cp PROFILE.db PROFILE.db.v32

Then,

for test in {1..100}
do
    cp PROFILE.db.v32 PROFILE.db
    anvi-migrate PROFILE.db --migrate-dbs-safely --quiet

    if [ `sqlite3 PROFILE.db '.tables' | wc -l` == "15" ]
    then
        echo "TEST #$test: PASS"
    else
        echo "TEST #$test: FAIL :("
    fi

    sha1sum PROFILE.db
done

These should work on any computer system with anvi'o v7.

ivagljiva commented 3 years ago

sounds like a race condition. Your I/O pressure is high or your CPU is busy? You see the problem.

Makes sense. I was running anvi-run-kegg-kofams -T 4 on several contigs dbs last night, might have been when I replicated the error.

After updating my anvi'o environments, I tried your test loops and got 'PASS' for everything, even when I tried running anvi-run-kegg-kofams -T 4 at the same time to put more stress on my system.

So maybe upgrading my sqlite3 fixed this, or maybe it is just an elusive race condition. Either way, I don't think there is much else we can do about it.