Nice system to track db changes

spine-tools / Spine-Toolbox

Spine Toolbox is an open source Python package to manage data, scenarios and workflows for modelling and simulation. You can have your local workflow, but work as a team through version control and SQL databases.

https://www.tools-for-energy-system-modelling.org/

GNU Lesser General Public License v3.0

72 stars 17 forks source link

Nice system to track db changes #1559

Open manuelma opened 3 years ago

manuelma commented 3 years ago

We can dump spine dbs into a JSON and back, mostly. Which means we could use a JSON file instead of a SQLite file for version control purposes.

The first way of doing this that came to mind was to have a simple workflow: DC1 -> Tool1 -> DS -> Tool2 -> DC2. Both DC1 and DC2 point to the same JSON file (in references). Tool1 imports the JSON into the DS, and Tool2 exports the db into the JSON (they are just python one-liners that call spinedb_api).

But it seems like a lot of items to do a simple thing, so an alternative idea would be to support JSON in the DS item itself, as if it was just another SQL dialect that works for Spine dbs only. So basically, add JSON to the dialect combobox in the Data Store item.

Is this a good idea? It feels a little hacky. Any alternatives?

soininen commented 3 years ago

Frankly, I'm unsure what's the motivation here? Keep a database in text format to make it 'nicer' for git?

manuelma commented 3 years ago

Yes, it sounds a bit crazy and after some thinking, I don't believe we want to go there. But the 'feature request' is to be able to track changes on a database nicely. I'm thinking we should use the commit table for this purpose rather than git+json, because git+json would hint that the preferred 'database format' is json, which is kinda insane.

DillonJ commented 3 years ago

Some ideas and thoughts...

Generally, would we like some sort of GIT integration and support within toolbox? This could be the highest-level functionality where one could push/pull commit changes to their project from within toolbox. E.g. have an "add project from repository" feature. Then you would need some sort of nice solution for the sqlite files. Obviously this would all need thinking through.

Then there would be less complete GIT support we could add, perhaps on a per tool level perhaps?

Finally... perhaps DB editor could support "committing/pusing/pulling" the datastore from/to a repo which I suppose would entail converting to JSON and back.

A short term solution I suppose would be to source control the project and export a JSON from time to time, whenever you want it to be committed and store it within the project folder.

manuelma commented 3 years ago

Those are good points @DillonJ

I'm thinking we could also use the commit table to 'track' changes. Of course is not as powerful as git, but we can see how it works. I'm now working on a minimal interface to view commits to the database from the Spine DB editor.

perhaps DB editor could support "committing/pusing/pulling" the datastore from/to a repo which I suppose would entail converting to JSON and back

This would be like turning sqlite into a server-based SQL engine, via JSON. It feels like a good idea and I think we should explore it.

A short term solution I suppose would be to source control the project and export a JSON from time to time, whenever you want it to be committed and store it within the project folder.

This is good as a temporary solution, but it could become tricky to remember that I need to export to JSON before pushing, and import from JSON after pulling.

jkiviluo commented 3 years ago

I believe this functionality is interesting mainly for data structure templates (not for real data). If that is the case, then maybe the Data Store item should have template capability:

it can be populated from a linked template (separate JSON that is defined in the data store spec and can be in git)
it's structure can be exported to this linked template
there can be a setting to create an empty database and autoload the structure if the data store does not come with the project (but the template does)

This of course aims towards portability of the projects with or without data. Using git or not. @ererkka

manuelma commented 3 years ago

The linked template could be implemented as a Data Store specification. We have flirted with this idea in the past, and it could also solve https://github.com/Spine-project/Spine-Toolbox/issues/1452. It could also help us to get rid of the "Load template" Tool specification in SpineOpt plugin, which is causing the problems described in https://github.com/Spine-project/SpineOpt.jl/issues/403

The specification editor in this case could be the same Spine DB editor with different options. Maybe I will give it a shot, what do you think @DillonJ ?

DillonJ commented 3 years ago

This could be nice - so if we implemented template as a datastore specification, then we could have a SpineOpt specific datastore item that comes with the SpineOpt plugin. I think this would be very nice. I currently feel the load template tool is a little cumbursome, as typically, loading a template is a once-off activity.

However, how would it work when, for example, I add new object classes to my datastore. Is the specification template something that is only relevant when you create a new, emtpy datastore of that type? Would it be used for validation? Would we have a nice way to update it?

jkiviluo commented 3 years ago

The specification editor in this case could be the same Spine DB editor with different options.

I was thinking the template is based on the structure of the database that there is (no need for a separate DB editor for template specification). I don't see the use case for the situation where your template structure should be different than your DB structure.

However, how would it work when, for example, I add new object classes to my datastore. Is the specification template something that is only relevant when you create a new, emtpy datastore of that type?

The DB structure template could be applied for a new empty datastore or to update an existing DB (at least it could add anything that is missing - renaming is much harder).

Would it be used for validation?

Maybe. I don't see this clearly. I believe validation rules should be in the parameter definition (that of course will also be exported to the template).

Would we have a nice way to update it?

The most simple would be to right-click and export current structure to the template.

jkiviluo commented 1 year ago

One option to provide templates is to have is_template or is_archetype column in the entity table of the DB. It would not support version control in the same way as JSON based template, but feels in other ways more elegant. They do not need to preclude each other.