nlesc-sigs / data-sig

Linked data, data & modeling SIG
Other
5 stars 3 forks source link

eWaterCycle dataset storage systems #16

Closed sverhoeven closed 6 years ago

sverhoeven commented 6 years ago

For the eWaterCycle 2 project we want hydrologists to be able to

There are lots of different datasets needed to run a model like terrain elevation, temperature/precipitation over time. Some of them are in netcdf format.

In the project we would like to store datasets in a system that is

In the project we are looking at what generic storage systems could be used and which hydrology specific solutions are out there.

romulogoncalves commented 6 years ago

Some questions:

Is the input data is already FAIR?

The project output is not only data, but also a model. How to make a model FAIR?

FAIRness for models is challenging because it involves FAIR software and FAIR data. Maybe in the end we need see it as FAIR digital objects as specified in the FAIR metrics work. Hence, we should avoid the categorization into either FAIR software or FAIR data.

arnikz commented 6 years ago

Some additional questions to consider. What other file formats are used (besides NetCDF) for sharing (meta)data, models etc. in this domain? What are the usual file sizes? Are the data hierachical or graph-like? For file-based (meta)data management I would recommend iRODS and the Semantic Web/Linked Data approach for (federated) queries using rich/standardized/machine-readable metadata;)

sverhoeven commented 6 years ago

To the models we decided that a module should have a Basic Model Interface (https://github.com/csdms/bmi) and we will ship the models in a Docker image.

The sizes differ based on the scale of the simulation. For example a global model generates 200Gb for a 7 day forecast. Other models will be much smaller.

When we have other storage requirements we will open a new issue.