Open meren opened 3 years ago
I like this. Certainly this would be very useful for users (anvi-db-info
) and for programmers (with the store_configuration
and read_configuration
API).
However, in my opinion this not necessarily storing a databases history, but rather storing its current state. For example if someone ran KOfams again on this database, it would overwrite these rows with the updated information.
I'm being pedantic only because the mention of history made me think of storing all operations that a database partakes in, maybe in a table called history
. Any program that takes the db artifact as input or output could be added to this history
table, so I complete log of what the DB has undergone is available. One use case could be so a user can retrieve prior commands they have ran. Another more ambitious use case would be for parsing history
in order to create a reproducible workflow.
However, in my opinion this not necessarily storing a databases history, but rather storing its current state
Yes, indeed. I meant the state (but couldn't say it since state is so so associated with the interactive interface) :)
I wanted to mention this as a potential future design effort. I will use the contigs database as an example, but it is applicable to any anvi'o database.
The problem
The codebase includes many classes that operate on db artifacts and update these artifacts with new data. For instance, when you run
anvi-run-kegg-kofams
you get your KOfams that influence the results ofanvi-estimate-metabolism
. But the classanvi-run-kegg-kofams
inherits is configured with many default or user-defined parameters, yet all these key details to make sense of the results stored in a contigs database that has lost its connection to its creator (like these ones for instance) are forever lost in the log files of whoever run any program on any given anvi'o database.Currently we keep key information for a given contigs database in its
self
table (the contents of which is printed out anytime someone runsanvi-db-info
on a database and is used by many programs). But the current design of this two-column table does not have much room for expansion.The solution
We could solve this problem one more column to the self table, and by editing all classes to take advantage of that. For instance, this is an example self table from a
v7
contigs db:I think this would've been a better design:
Practical implications
This design would enable any anvi'o program or external programs that modify things in anvi'o contigs databases to store their configuration this way:
And any other program that may need the configuration of a particular program (such as
anvi-gen-genomes-storage
that doesn't want to create a genomes storage from contigs dbs that contain incompatible data) to retrieve it this way:Continuing with the example of
anvi-run-kegg-kofams
, when it is done running, it would update theself
table with the following information: