The current model development cycle in Magma goes like this:
First we plan:
Add new models or alter existing models in the project repo.
Load the new models into Magma; use the "plan" command to compute a migration, based on the difference between the current database schema and the current model defintion.
Then we modify the plan, i.e., add or remove atomic operations to the database
Check the planned migration for errors (notably, the plan cannot distinguish between "rename column" and "drop column" + "add column"). Rewrite to fix errors and allow other data loading tasks.
Run the migration.rb file to update the database.
Subsequently there is cleanup and deploy:
Validate the model, perhaps by test-loading some data and building views in Timur
Push code (models + migration changes) to production
Planning strategy vs. atomic operations strategy
There are two broad strategies we might use to exposing model-editing functionality:
1) Describe the new model, compute the differences required and make the appropriate changes
This "plan" methodology allows us to quickly dump in a large template with perhaps hundreds of attributes. However, the plan is not trustworthy. While using it allows us to sketch in broad strokes, the actual migrations it generates must be validated and corrected before being run. This means the process above is dangerous: that is, it is not guaranteed to never accidentally destroy data. This requirement for validation of migrations also slows down the process of deploying new models.
2) Allow atomic operations to be executed by the user
This allows us to sculpt in details, adding or removing attributes to reshape our existing models. But it would grow tedious to sketch out new projects this way, adding in attributes to a model one-at-a-time.
Examples
Here are some scenarios we might imagine using these operations:
We create a new project. We wish to quickly inflate a data graph that we can put data in; we already know the data types and structure. We assemble a basic set of models using templates.
We have an existing project, which is now processing samples using a new data type. We add in models for this data type to the existing project data graph based on some templates.
Our existing project requires some new attributes to support an idiosyncratic data requirement. We add the new attributes to an existing model.
One of our attributes is improperly harboring PHI. We wish to remove it from the model and destroy the data.
The API
While broadly speaking we wish to plan AND use atomic operations, as both of these operations rely on executing a series of atomic operations on the data graph, ultimately this is what the /update_model API should provide - any planning services Magma might offer are worth considering in this context, but they might happen elsewhere, e.g. in a /plan API or outside Magma entirely. Instead we will focus on offering an API that provides atomic operations to change the data graph.
The operations
The basic operations we might require are:
1) add a model { action: 'add_model', model_name: 'victim', parent_model_name: 'monster', parent_link_type: 'collection' }
2) remove a model { action: 'remove_model', model_name: 'victim' }
3) rename a model { action: 'rename_model', model_name: 'victim', new_model_name: 'casualty' }
4) add an attribute { action: 'add_attribute', model_name: 'monster', attribute_name: 'species', ...required_attribute_params } 5) remove an attribute{ action: 'remove_attribute', model_name: 'monster', attribute_name: 'species' } 6) rename an attribute{ action: 'rename_attribute', model_name: 'monster', attribute_name: 'species', new_attribute_name: 'species_name' } 7) update an attribute{ action: 'update_attribute', model_name: 'monster', attribute_name: 'species', validation: { type: 'Regexp', value: '/^[A-Z][a-z-]+ [a-z-]+$/' }, format_hint: 'Linnean species name, e.g. "Sus scrofa"' } 8) cast an attribute{ action: 'cast_attribute', model_name: 'monster', attribute_name: 'weight', type: 'float' }`
Each of these actions can be classified (or has some sub-actions mostly in the case of #5) as either:
non-migrating - does not change the schema of the database (only update_attribute)
safe-migrating - only adds to the schema of the database (add_model add_attribute)
destructive-migrating - removes or overwrites data in the database (remove_model rename_model remove_attribute rename_attribute cast_attribute)
We may add in actions in this order, so we can start fleshing out the migration API without having to deal with issues of data loss.
The controller
The post from the user is like:
{ project_name: 'labors', actions: [ { action_name: 'add_model', ... }, ... ] }
The controller will:
1) Examine each of the actions and validate its required arguments are well-formed
2) Simulate the serial application of these actions to the data graph. Record the intermediate states.
3) Look for actions that produce incorrect states (e.g., they attempt to reference an attribute after it has been renamed, or they yield an orphaned subgraph, etc.). Collect the errors and report back to the user.
4) If there are no invalid states produced, look for destructive-migrating actions. Compute a confirmation hash for these actions, collect, and report back to the user. The user must then re-post the request with the confirmation hashes attached: actions: [ ..., { action_name: 'remove_model', confirmation: 'd9d9d9d9d' }, ... ]
5) If all of the destructive actions have been confirmed, apply the actions in order to the data graph. Any actions requiring migrations will perform them. Any data-destroying actions will be sure to archive the destroyed data (described elsewhere).
Introduction
Background
The current model development cycle in Magma goes like this:
First we plan:
Then we modify the plan, i.e., add or remove atomic operations to the database
Subsequently there is cleanup and deploy:
Planning strategy vs. atomic operations strategy
There are two broad strategies we might use to exposing model-editing functionality:
1) Describe the new model, compute the differences required and make the appropriate changes This "plan" methodology allows us to quickly dump in a large template with perhaps hundreds of attributes. However, the plan is not trustworthy. While using it allows us to sketch in broad strokes, the actual migrations it generates must be validated and corrected before being run. This means the process above is dangerous: that is, it is not guaranteed to never accidentally destroy data. This requirement for validation of migrations also slows down the process of deploying new models.
2) Allow atomic operations to be executed by the user This allows us to sculpt in details, adding or removing attributes to reshape our existing models. But it would grow tedious to sketch out new projects this way, adding in attributes to a model one-at-a-time.
Examples
Here are some scenarios we might imagine using these operations:
The API
While broadly speaking we wish to plan AND use atomic operations, as both of these operations rely on executing a series of atomic operations on the data graph, ultimately this is what the /update_model API should provide - any planning services Magma might offer are worth considering in this context, but they might happen elsewhere, e.g. in a /plan API or outside Magma entirely. Instead we will focus on offering an API that provides atomic operations to change the data graph.
The operations
The basic operations we might require are:
1) add a model
{ action: 'add_model', model_name: 'victim', parent_model_name: 'monster', parent_link_type: 'collection' }
2) remove a model{ action: 'remove_model', model_name: 'victim' }
3) rename a model{ action: 'rename_model', model_name: 'victim', new_model_name: 'casualty' }
4) add an attribute{ action: 'add_attribute', model_name: 'monster', attribute_name: 'species', ...required_attribute_params } 5) remove an attribute
{ action: 'remove_attribute', model_name: 'monster', attribute_name: 'species' }6) rename an attribute
{ action: 'rename_attribute', model_name: 'monster', attribute_name: 'species', new_attribute_name: 'species_name' }7) update an attribute
{ action: 'update_attribute', model_name: 'monster', attribute_name: 'species', validation: { type: 'Regexp', value: '/^[A-Z][a-z-]+ [a-z-]+$/' }, format_hint: 'Linnean species name, e.g. "Sus scrofa"' }8) cast an attribute
{ action: 'cast_attribute', model_name: 'monster', attribute_name: 'weight', type: 'float' }`Each of these actions can be classified (or has some sub-actions mostly in the case of #5) as either:
update_attribute
)add_model add_attribute
)remove_model rename_model remove_attribute rename_attribute cast_attribute
)We may add in actions in this order, so we can start fleshing out the migration API without having to deal with issues of data loss.
The controller
The post from the user is like:
{ project_name: 'labors', actions: [ { action_name: 'add_model', ... }, ... ] }
The controller will: 1) Examine each of the actions and validate its required arguments are well-formed 2) Simulate the serial application of these actions to the data graph. Record the intermediate states. 3) Look for actions that produce incorrect states (e.g., they attempt to reference an attribute after it has been renamed, or they yield an orphaned subgraph, etc.). Collect the errors and report back to the user. 4) If there are no invalid states produced, look for destructive-migrating actions. Compute a confirmation hash for these actions, collect, and report back to the user. The user must then re-post the request with the confirmation hashes attached:
actions: [ ..., { action_name: 'remove_model', confirmation: 'd9d9d9d9d' }, ... ]
5) If all of the destructive actions have been confirmed, apply the actions in order to the data graph. Any actions requiring migrations will perform them. Any data-destroying actions will be sure to archive the destroyed data (described elsewhere).