Open graft opened 5 years ago
The method in (2) seems more attractive as it does not require reconstruction. To resolve foreign key issues, each table could be hidden behind a view, something like SELECT DISTINCT $identifier FROM table WHERE version < $version_date ORDER BY version ASC
. Each table thus reduced may be joined as usual.
A few more outstanding issues: 1) Since the record is duplicated in its entirety, the information content greatly expands (especially for, say, matrix columns or other large JSON values). We can hope that there won't be that many versions of each record that this is an issue.
2) Probably the most significant issue is that there is a single table, i.e., a single model. This model changes over time, which means the format of any rows in the table changes over time. Therefore we cannot reconstruct the exact state of the database at a previous date because the attributes on a model might be different. This seems unresolvable using the single-table model. One way to resolve would be to add a new table every time the model is changed - this is a LOT of data duplication and would result in table proliferation. Resolving this over time might also be tough.
In parallel to the migration objective, we wish to extend Magma so that it is non-destructive. That is, data is never removed or overwritten, it is only inserted. This will require extensive changes to the way Magma models are defined, stored in tables, and queried, roughly as follows:
Time travel for models:
model
and attribute
tables from the associated model table, instead using a table_name
column in the model
table and a column_name
column in the attribute
table. Table and column names are random keys.Time travel for records:
Currently there is a single record for each identifier for any given Magma model, corresponding to a single row in the table for that model, enforced by a uniqueness constraint on the identifier. This means any changes to the record overwrite old data, which is lost forever.
We would, rather, like to preserve old changes so that we can review the history of the record and see what updates have been made to it. We might also wish to do searches going back in time, that is, querying the state of the database as it existed on a certain date.
How can we maintain this sort of history in a relational database? Some possibilities:
1) Changes to records are described in a journal. Using the journal you may reconstruct the previous history of a record. This has the advantage of maintaining the existing single-row-single-record model, but since rewinding time involves applying a series of diffs it might be difficult or impossible to perform.
2) Records are not updated (i.e., drop updated_at); new records are inserted with a version (created_at). There may be multiple entries for each identifier. This is very simple and does not require us to reconstruct previous states using a set of diffs, they are already present in the same table. However, we have destroyed foreign key relationships. Whereas before links made by identifier could be made instead using a database primary key, now they must, instead, be made just by identifier. A record with an identifier "foreign key" then points not to a single record, but a set of possible records.