spine-tools / Spine-Database-API

Database interface to Spine generic data model
https://www.tools-for-energy-system-modelling.org/
GNU Lesser General Public License v3.0
6 stars 5 forks source link

Parameter values as Apache Arrow objects #353

Open soininen opened 4 months ago

soininen commented 4 months ago

(This issue is not about the binary blobs we have in the 'value' column of 'parameter_value' table in Spine database scheme.)

We have been discussing using Apache Arrow as an alternative for the data structures in parameter_value module. To get things rolling, I though I could get my hands dirty with Arrow by implementing an equivalent to parameter_value module which deals with Arrow tables instead of TimeSeries, Maps and whatnot. Initially, this will be more like a technological demo or proof-of-concept. Also, I am not planning to replace parameter_value, rather provide an alternative interface for parsing parameter values.

manuelma commented 4 months ago

Very good, I'd be looking forward to see the results! I understand you plan to keep the 'public' API from spinedb_api.parameter_value but just change the internals, right?

soininen commented 4 months ago

I understand you plan to keep the 'public' API from spinedb_api.parameter_value but just change the internals, right?

I am not planning to change parameter_value at all but add a new module next to it. I think we should leave parameter_value as-is for backwards compatibility if we ever make the full switch to Arrow.

The new module (spinedb_api.arrow?) should emulate the interface of parameter_value. I guess the most important functions would be from_database() which returns an Arrow object and to_database() which converts an Arrow object to a binary blob.

manuelma commented 4 months ago

Sounds good! But ParameterValue and its subclasses are also 'public' - do you think it's possible to implement them in arrow?

soininen commented 4 months ago

But ParameterValue and its subclasses are also 'public' - do you think it's possible to implement them in arrow?

In fact, I am going to drop ParameterValue and just use the Arrow data types. I see no benefit in wrapping working data types in interfaces that do not offer any real improvements and can be considered niche. Client code can then work directly with standard Arrow API without the need to convert to/from ParameterValue.