spine-tools / Spine-Database-API

Database interface to Spine generic data model
https://www.tools-for-energy-system-modelling.org/
GNU Lesser General Public License v3.0
6 stars 5 forks source link

Possibly disable month and year for Duration parameter type #321

Open jkiviluo opened 10 months ago

jkiviluo commented 10 months ago

While Python, Julia and SpineOpt all handle month and year correctly for Duration parameter (in Python using dateutils.relativedelta), there are some drawbacks:

Our options:

Relates to: https://github.com/spine-tools/Spine-Database-API/issues/319

DillonJ commented 10 months ago

I know this is not the intention here but just wanted to make sure that this wouldn't impact time pattern syntax where it should still be ok to use months and years which we are doing now quite extensively

manuelma commented 10 months ago

My preference here would be to add a clear note so users know what they're messing with - but don't disable the support. I am not sure Julia is affected by this since they don't provide any simpler API. The only API does things right so probably it's well optimised.

jkiviluo commented 10 months ago

Please note: this is about resolution parameter. We would still allow to set parameter values for e.g. a particular month. Setting parameters for a particular month doesn't have the same issue, since then one is talking about a particular month that has a particular number of days - instead of duration of abstract month where the number of days is not known.

manuelma commented 10 months ago

I think to be fully clear we should say that this is about the resolution property for the TimeSeriesFixedResolution.

@jkiviluo I should also point out that if the user gives the resolution, they also need to give the start, and that makes everything concrete.

jkiviluo commented 10 months ago

Oh no, I've mixed things up again. I was really thinking about the Duration parameter type. For TimeSeriesFixedResolution things are indeed clear. I've clean up the issue name and description.

jkiviluo commented 10 months ago

Although, if you have a DB with a temporal resolution of one month, most models will not be able to handle it correctly unless it's implemented with TimeSeriesVariableResolution.

I guess it could be a DB level choice too. I'm now working on the generic DB and there I need to worry about these things. For SpineOpt DB 1 month resolution would be fine though.

manuelma commented 10 months ago

I think I misled you to use the word 'resolution' - my point was that we should try and use the terminology in the docs as much as possible (https://spine-database-api.readthedocs.io/en/0.8-dev/parameter_value_format.html)

Just curious @jkiviluo when you say most models you mean which ones in particular?

manuelma commented 10 months ago

@jkiviluo also, is the generic DB specified somewhere? How are we working on this?

soininen commented 10 months ago

At least the Python implementation of TimeSeriesFixedResolution handles monthly and yearly resolution correctly so if you e.g. export the time stamps to .csv you get the dates you would expect.

jkiviluo commented 10 months ago

Just curious @jkiviluo when you say most models you mean which ones in particular?

I didn't check, but most models don't even handle variable time resolution. And even models like Backbone and FlexTool that can have variable time resolution wouldn't function if one tried to say that they should roll forward a 1 month Duration. Or if you tried to set their time Resolution parameter to 1 month, it wouldn't work either - it could be made to work, but it would require quite a bit of coding since their modelling languages don't know how to interpret 1 month (interpreting 4 weeks is trivial in comparison).

jkiviluo commented 10 months ago

At least the Python implementation of TimeSeriesFixedResolution handles monthly and yearly resolution correctly so if you e.g. export the time stamps to .csv you get the dates you would expect.

Right, doing a script to handle the time series on Python side would make things easier for the Resolution parameter (for models that can handle variable time resolution). However, Duration (e.g. roll forward) requires understanding of the date. While it's plausible to manage that also in Python, it can require quite a bit of rewriting on the model side depending on the particular implementation.

jkiviluo commented 10 months ago

Is the generic DB specified somewhere? How are we working on this?

It's currently in WP5, Task 5.2 folder. I plan to make a repository copy (as a JSON) soon. The repo would then be the place to discuss too, but for now it's just directly to me.

jkiviluo commented 10 months ago

To get back to the original issue: is anyone using month or year for the Duration parameter? If so, we shouldn't start changing that right now. If not, we could consider it.

Otherwise this issue is for the future when we have more time to check if there is something to be gained on the speed side through disabling. Naturally we can implement the first option as soon as anyone has time for that - they won't preclude later options and would be helpful for the user.

manuelma commented 10 months ago

Thank you @jkiviluo - that sounds good. I think it's good to have the option even if for some models is 'slow'. It's not like we're rolling models like crazy, I doubt it will ever become a bottleneck (the time to roll a model is typically a small percentage of the 'solving time').

Also, the user knows the start date (at least in SpineOpt, and I assume everywhere), so they can know exactly where the rolling will jump each time.

Last, we're talking about roll_forward (or equivalent) in particular, but duration can be used in general and we don't know exactly what the user has in mind - so this is another reason to keep it.

Anyways, I'm inclined to keep it.

jkiviluo commented 10 months ago

the user knows the start date (at least in SpineOpt, and I assume everywhere)

That's not the case - many models use pseudo-dates or timestamps that are not dates (e.g. t00001).

It would be great if there is no speed issue in the solves (but I'd like to know for sure - one needs to process all the time series every time the model rolls and in some cases rolling can be very frequent - e.g. when trying to resolve what happens intra-day).

The speed issue could also manifest itself in e.g. visualizations. Plotting times series with relativedelta could be considerably slower than plotting with timedelta. And it's not just data that actually has months or years, but it would affect all time series data, since the code cannot know before checking whether it needs to run through the hoops to calculate the lengths of months or years.

nnhjy commented 8 months ago

There seems to be some overlap with this SpineOpt discussion #869