pik-piam / primap

Coordination of primap development
0 stars 0 forks source link

data structure #3

Closed mikapfl closed 3 years ago

mikapfl commented 4 years ago

As we discussed, a good data structure which is flexible enough for future use cases and at the same time comfortable to use needs to be the foundation for further work.

For that, I would like to evaluate "candidates" somewhat methodically, but I don't really have a plan how, yet.

mikapfl commented 4 years ago

As a first step, I will collect possible candidate data structures here, please chime in with your suggestions.

mikapfl commented 4 years ago

pandas DataFrame in traditional tidy format:

Even given these constraints, there are still some variables in the data representation:

Because the data structure relies on multiple, possibly joinable DataFrames, there is a need for a container, which is not yet defined.

mikapfl commented 4 years ago

A variation of the previous one would be the same format in principle, but all fixed variables are part of a MultiIndex. The semantics of MultiIndex are sufficiently different that I think it makes sense to treat this as a separate format.

mikapfl commented 4 years ago

xarray DataArray:

A collection of DataArrays with shared dimensions/coords, i.e. where the coords of each individual DataArray is a subset of the shared coords, can form a Dataset.

Because not all DataArrays may share dimensions/coords, there is possibly a need for a container, which is not yet defined.

There are, again questions around representation of units.

mikapfl commented 4 years ago

scmdata TimeSeries:

A collection of TimeSeries can be held in a ScmRun object.

mikapfl commented 4 years ago

scmdata ScmDataFrame:

mikapfl commented 4 years ago

pyam IamDataFrame:

mikapfl commented 4 years ago

postgresql tables:

mikapfl commented 4 years ago

a variation of postgresql tables would be sqlite tables:

mikapfl commented 4 years ago

datatoolbox Datatable

To manage a collection of Datatables, there is a custom database management system backed by git.

mikapfl commented 4 years ago

frictionless data Data Package

mikapfl commented 4 years ago

datalad

mikapfl commented 3 years ago

Results are summarized here: https://github.com/pik-piam/primap/blob/main/data%20structures/datalibraries_evaluation_results.pdf