Open mwinokan opened 2 days ago
@tdudgeon agrees that this sounds sensible but adds that database migrations will make the data in Fragalysis compatible between versions. But identifying and documenting the breaking changes in XCA and the loader is key.
@tdudgeon suggests a is_compatible_with
function that is maintained in XCA to work out if incremental uploads can proceed with legacy formatted data without needing to complete alignment and upload to spot issues
Determining whether changes are breaking will be non-trivial to solve, but unit testing (#1588) will help to determine
@ConorFWild suggests having a table in the source code documenting when breaking changes occur, the collator can then compare the version in preceding uploads and see if they are compatible by checking the table. This empirical approach means that it can easily be patched if new breaking changes are identified that didn't appear in test data.
XCA needs to create an upload directory hierarchy:
uploads / upload-dfv1 / upload_1
/ upload_2
/ upload_3
/ upload-dfv2 / upload_1
/ upload_2
/ upload-dfv3 / upload_1
Initial commit implementing basic data model versioning in collator
.
https://github.com/xchem/xchem-align/commit/c263cc7eab253ac56e89a51519321475325b4e58
There is a baked in major.minor version number (currently 1.0). Minor version number differences are treated as OK, major version number differences are errors and collator fails.
This has been deployed to the XCA staging environment.
Some aspects to consider:
Still to be done is any automagic to migrate the directory structure. The directory structure that is proposed is almost certainly not what users currently have, so not sure if we need to make them update this to be compatible, or try to migrate automatically (which might be complex and unreliable).
I propose to create a new migrate
tool that does any migrations of the directory structure.
XCA would use the directory named upload-current
.
If the data model version changes (e.g. from 1.3
to 2.0
) that dir is renamed to upload-v1.3
and a new upload-current
dir is created and the config.yaml
and assemblies.yaml
files copied to the new dir.
This way when the data model major version changes the user will be told to run the migrate tool (the command to run is shown and can be C&Ped) so that the user makes a conscious decision to do this (avoiding risk of automatically screwing things up), the user will also be told what has happened and they might need to update config.yaml
and assemblies.yaml
, but the collator command they need to run is still the same as they ran before as it still uses the upload-current
dir.
@mwinokan and @phraenquex discussed that the fluid data formats for XCA, loader, and f/e are causing headaches for users and those helping to debug alignments.
The following solution is proposed:
upload_1
and subsequent subdirectories will reside. The name of the directory will also indicate the data formatupload_1
is needed, we should encourage users to upload the new tarball to a separate TAS so that snapshots to the old target are not broken. This data format version could be on the Target model.@phraenquex adds: the user should not have to dig to find out if the format is compatible or not