Add timeseries to datapackage

FelixMau commented 1 year ago

I think it will be easier to review if creation of foreign keys and saving of the DataPackage is moved to another branch to keep PRs small.

henhuy commented 1 year ago

Did some refactoring - but not finished yet (switched to draft mode). Here my ideas/changes:

I refactored mapping to read TS related keys either from mapping or to guess it (if only one timeseries exists).
Mapping has to be improved, so that process-depended mappings are allowed.
Removed "profile", "region" and "year" mappings in parametrize_dataclass as "profile" will never be found in data and "region" and "year" are not in facades (must be handled differently)
switched to hack-a-thon example, as minimal example is not like real SEDOS data (SEDOS data does not contain carrier, tech or profile - aka is much harder to parse)
refactor_timeseries is now done every time - as timeindex is needed and regions have to be exploded always
unfortunately refactor_timeseries currently fails for "onshore" of wind turbine process (I think the timeindex/region processing can be simplified - looks too complicated)
hardcoded "carrier" and "tech" for facades - we can think about smart entries there later
could not find package for setup_environment? (I thought I only commmented it out - but seems to be gone)

Hope you can understand my ideas (as I wrote them down very quickly - sorry for that!)

henhuy commented 1 year ago

Also as a recommendation:

Write smaller tests! test_example with build_datapackage is way too heavy to test :) (for example see test_refactor_timeseries)
use pre-commit for linting

henhuy commented 1 year ago

Started to refactor timeseries refactor function. Not yet ready. Following things must be done: Pivot region to every column explode columns append dataframes with different timeindex using axis=0

If the tests runs successfully, this should be solved!

FelixMau commented 1 year ago

Thank you for your review and sorry for untidy code! My pre-commit checks are not automatically running, sometimes I forget to run them.

I hope I could simplify the refactor_timeseries process, yet It is somehow complicated to catch and map everything. Maybe I have overseen the correct Pandas functionality, I agree there must be some better way to directly map a new index and Pivot directly from explode but I could not ind it. I think using MultiIndexing together with stacking/unstacking would work as well but I think this approach is more complicated to understand.

Regarding setup_environment please see #19

Can you please explain further how mapping should be improved? Do you want the mapper to be able to handle the same facade differently for different Processes?

sedos-project / data_adapter_oemof

Add timeseries to datapackage #20