Note, that the project is still under development!
The "HDF5 Research Data Management Toolbox" (h5RDMtoolbox) is a Python package supporting everybody who is working with HDF5 to achieve a sustainable data lifecycle which follows the FAIR (Findable, Accessible, Interoperable, Reusable) principles. It specifically supports the five main steps of planning, collecting, analyzing, sharing and reusing data. Please visit the documentation for detailed information of try the quickstart using colab.
For everybody, who is...
For everybody, who ...
The toolbox implements five modules, which are shown below. The numbers reference to their main usage in the stages in the data lifecycle above. Except the wrapper module, which uses the convention module, all other modules are independent of each other.
Current implementation highlights in the modules:
h5py
package. It allows to include so-called standard names,
which are defined in conventions. And it implements interfaces, such as to the package xarray
, which allows to carry
metadata from HDF5 to the user. Other high-level interfaces like .rdf
allows assigning semantic information to the
HDF5 file.hdfDB
and mongoDB
are implemented. The hdfDB
module allows to use HDF5 files as a
database. The mongoDB
module allows to use mongoDB as a database by mapping the metadata of HDF5 files to the
database.A quickstart notebook can be tested by clicking on the following badge:
Please find a comprehensive documentation with many examples here or by click on the image, which shows the research data lifecycle in the center and the respective toolbox features on the outside:
A paper is published in the journal inggrid.
Use python 3.8 or higher (automatic testing is performed until 3.12). If you are a regular user, you can install the package via pip:
pip install h5RDMtoolbox
Developers may clone the repository and install the package from source. Clone the repository first:
git clone https://github.com/matthiasprobst/h5RDMtoolbox.git@main
Then, run
pip install h5RDMtoolbox/
Add --user
if you do not have root access.
For development installation run
pip install -e h5RDMtoolbox/
The core functionality depends on the following packages. Some of them are for general management others are very specific to the features of the package:
General dependencies are ...
numpy>=1.20
: Scientific computing, handling of arraysmatplotlib>=3.5.2
: Plottingappdirs>=1.4.4
: Managing user and application directoriespackaging
: Version handlingIPython>=8.4.0
: Pretty display of data in notebooksregex>=2020.7.9
: Working with regular expressionsSpecific to the package are ...
h5py=3.7.0
: HDF5 file interfacexarray>=2022.3.0
: Working with scientific arrays in combination with attributes. Allows carrying metadata from HDF5
to userpint>=0.19.2
: Allows working with unitspint_xarray>=0.2.1
: Working with units for usage with xarraypython-forge==18.6.0
: Used to update function signatures when using
the standard attributespydantic
: Used to
validate standard attributespyyaml>6.0.0
: Reading and writing of yaml files, e.g. metadata definitions (conventions). Note, lower versions
collide with python 3.11requests
: Used to download files from the internet or validate URLs, e.g. metadata definitions (conventions)rdflib
: Used to enable working with RDFontolutils
: Required to work with RDF and derive semantic description of HDF5 file contentTo run unit tests or to enable certain features, additional dependencies must be installed.
Install optional dependencies by specifying them in square brackets after the package name, e.g.:
pip install h5RDMtoolbox[mongodb]
[mongodb]
pymongo>=4.2.0
: Database solution for HDF5 files[csv]
pandas>=1.4.3
: Mainly used for reading csv and pretty printing[snt]
xmltodict
: Reading of xml filestabulate>=0.8.10
: Pretty printing of tablespython-gitlab
: Access to gitlab repositoriespypandoc>=2.3
: Conversion of markdown files to htmlIf you intend to use the package in your work, you may cite the paper in the journal inggrid
Here's the bibtext to it:
@article{probst2023h5rdmtoolbox,
title={h5RDMtoolbox-A Python Toolbox for FAIR Data Management around HDF5},
author={Probst, Matthias and Pritz, Balazs},
year={2023},
publisher={ing. grid Preprint Repository}
}
Feel free to contribute. Make sure to write docstrings
to your methods and classes and please write tests and use PEP
8 (https://peps.python.org/pep-0008/)
Please write tests for your code and put them into the test/
folder. Visit the README file in the
test-folder for more information.
Pleas also add a jupyter notebook in the docs/
folder in order to document your code. Please visit
the README file in the docs-folder for more information on how to compile the documentation.
Please use the numpy style for the docstrings: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html#example-numpy