spine-tools / Spine-Toolbox

Spine Toolbox is an open source Python package to manage data, scenarios and workflows for modelling and simulation. You can have your local workflow, but work as a team through version control and SQL databases.
https://www.tools-for-energy-system-modelling.org/
GNU Lesser General Public License v3.0
69 stars 17 forks source link

Transformer between importer and data store #1305

Open jkiviluo opened 3 years ago

jkiviluo commented 3 years ago

Transformers do not support transformations between an importer and Spine data store. That would be nice.

soininen commented 3 years ago

Looks like I broke Data transformer with my latest changes. Should be fixed now --- at least when connected between Data store and Exporter. Making transformer to work between importer and Data store is a completely different story. While waiting for that to be resolved, you can

  1. Import data to a temporary database and connect that database to the actual database via a data transformer
  2. Write a Tool script that does the needed transformations to the source data before feeding it to importer.
jkiviluo commented 3 years ago

In my particular case, I can do the transformation between DB and exporter. I will change the issue name to Importer - Data Store.

soininen commented 3 years ago

Thanks for updating the title and description. The actual feature request here is now much clearer.

There are two ways I can think this could be done:

  1. Apply transformation at import time in import_mappings. How feasible this is and how much it would complicate the API need to be investigated
  2. Apply transformations after import inplace, i.e. import data as-is, then transform the data within the database. This would be nice in that we could do these transformations to any existing database at any time. Problems might arise with name clashes at import, though. No idea if this is even feasible.
soininen commented 3 years ago

Inplace transformation is actually already doable: just connect two data stores pointing to the same database via a Transformer. Case solved.

jkiviluo commented 3 years ago

I wouldn't put a high priority to this. Quite ok functionality can be achieved by having a transformer between two data stores, which is supported.

jkiviluo commented 3 years ago

And your solution is even nicer. Although how does it play with DAG order?

soininen commented 3 years ago

(although how does it play with DAG order)?

You have two data stores using the same database. That plays very well with the DAG.

jkiviluo commented 3 years ago

Ok, right. I thought you meant that there would be a small loop from DS to transformer and back to DS.

manuelma commented 3 years ago

How about DT advertises an in-memory database backwards?

Importer -> DT -> DS

Importer would import data into the in-memory db, DT would apply the 'transformation filter' on that db, and DS would merge that db into it's own physical db.

That could work if in-memory dbs were shareable by URL, but they are only shared by 'connection instance'...

soininen commented 3 years ago

That could work if in-memory dbs were shareable by URL, but they are only shared by 'connection instance'...

Indeed, makes them unusable in many scenarios unfortunately.

manuelma commented 3 years ago

I don't know, there might be a way... The double DS pointing to the same URL solution is good, but might be a little bit too clever, don't you think?

On the other hand, Importer -> DT -> DS seems logical. It's only an implementation detail from our part that prevents it to work, right? (that we only share stuff by url)

soininen commented 3 years ago

It's only an implementation detail from our part that prevents it to work, right? (that we only share stuff by url)

Right. We could make Importer -> DT -> DS work with URLs for example if DT passed Importer the DS's URL with some clever write-to-temporary-alternative filter. Importer would then write to that alternative. When DT's execution came it would transform the data from the special alternative inplace.