sentinel-energy / friendly_data

Data format to interoperate between models and frameworks
https://sentinel-energy.github.io/friendly_data/
Apache License 2.0
12 stars 2 forks source link

Time series API #8

Closed suvayu closed 4 years ago

suvayu commented 4 years ago

Introduce an API to read non-standard time series datasets. The specifics are outlined in the docstring for sark/tseries.py::read_timeseries. The variations that are supported are:

codecov[bot] commented 4 years ago

Codecov Report

Merging #8 into master will decrease coverage by 0.45%. The diff coverage is 97.61%.

Impacted file tree graph

@@             Coverage Diff             @@
##            master       #8      +/-   ##
===========================================
- Coverage   100.00%   99.54%   -0.46%     
===========================================
  Files            5        6       +1     
  Lines          176      218      +42     
===========================================
+ Hits           176      217      +41     
- Misses           0        1       +1     
Flag Coverage Δ
#unittests 99.54% <97.61%> (-0.46%) :arrow_down:
Impacted Files Coverage Δ
sark/dpkg.py 100.00% <ø> (ø)
sark/tseries.py 97.61% <97.61%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f9a233c...61be1c5. Read the comment docs.

suvayu commented 4 years ago

@brynpickering, could you review this? The goal is to parse non-standard time series datasets. The specifics are outlined in the docstring for sark/tseries.py::read_timeseries. If you can share other examples that you have come across, that would be even better :)

PS: To keep it simple, you can limit your comments to the sark/tseries.py file, and associated docs & tests :-p

suvayu commented 4 years ago

I assume these cases (table and multicol) are known structures from some SENTINEL partner models; I've rarely come across them...

These are based on actual examples I have from other SENTINEL partners :)

Another feature I think this would benefit from is explicitly defining the datetime format. [...]

It was on my to do list (in my head :-p), created an issue: #11

danielhuppmann commented 4 years ago

I'm following this repo because I'd like to see how we can collaborate with the ongoing development of our pyam package and the work in the SENTINEL-sister-project openENTRANCE (see the nomenclature.

I think that you are pushing too much implicit functionality or "trying to figure out intent" into the archive-package. In openENTRANCE, we are rather following the approach that it is the responsibility of each modelling team to translate between a common standard (int for yearly timeseries, datetime for continuous-time data, str for agreed-upon representative timeslices) and whatever is produced/needed by a modelling framework. That allows to keep the API and internals manageable.

suvayu commented 4 years ago

Hi @danielhuppmann, thanks for your comment. My direction here is not apparent (as it's not mentioned anywhere), but the intent of this particular module is to help our partners conform with whatever schema the project finalises. So this API will serve to implement conversion tools (most likely CLI) into a frictionless datapackage that has the agreed upon schema.

The input data this API is expected to read is based on actual examples I have from our partners. If SENTINEL does not provide easy means to convert them to the agreed upon format, I fear no one is going to use it. I understand the issue of maintainability, and by no means I intend to be all inclusive. Does that address your worry? At least partially?

danielhuppmann commented 4 years ago

Not sure which approach will work best in practice to get modellers to use the tools - let's compare notes in two years and see how it worked out. 😜

In any case, it would be really helpful to have an overview of all supported table formats and the required arguments - we tried to do this here (but only for yearly data, didn't get to write tutorials for our continuous-time (datetime) features).