matthiasprobst / h5RDMtoolbox

Supporting a FAIR Research Data lifecycle using Python and HDF5.
https://h5rdmtoolbox.readthedocs.io/en/latest/
MIT License
14 stars 1 forks source link
conventions data-management fair-principles hdf5 python research-data-management xarray

HDF5 Research Data Management Toolbox

Tests DOCS Documentation Status pyvers

Note, that the project is still under development!

The "HDF5 Research Data Management Toolbox" (h5RDMtoolbox) is a Python package supporting everybody who is working with HDF5 to achieve a sustainable data lifecycle which follows the FAIR (Findable, Accessible, Interoperable, Reusable) principles. It specifically supports the five main steps of planning, collecting, analyzing, sharing and reusing data. Please visit the documentation for detailed information of try the quickstart using colab.

Highlights

Who is the package for?

For everybody, who is...

Who is it not for?

For everybody, who ...

Package Architecture/structure

The toolbox implements five modules, which are shown below. The numbers reference to their main usage in the stages in the data lifecycle above. Except the wrapper module, which uses the convention module, all other modules are independent of each other.

H5TBX modules

Current implementation highlights in the modules:

Quickstart

A quickstart notebook can be tested by clicking on the following badge:

Open Quickstart Notebook

Documentation

Please find a comprehensive documentation with many examples here or by click on the image, which shows the research data lifecycle in the center and the respective toolbox features on the outside:

A paper is published in the journal inggrid.

Installation

Use python 3.8 or higher (automatic testing is performed until 3.12). If you are a regular user, you can install the package via pip:

pip install h5RDMtoolbox

Install from source:

Developers may clone the repository and install the package from source. Clone the repository first:

git clone https://github.com/matthiasprobst/h5RDMtoolbox.git@main

Then, run

pip install h5RDMtoolbox/

Add --user if you do not have root access.

For development installation run

pip install -e h5RDMtoolbox/

Dependencies

The core functionality depends on the following packages. Some of them are for general management others are very specific to the features of the package:

General dependencies are ...

Specific to the package are ...

Optional dependencies

To run unit tests or to enable certain features, additional dependencies must be installed.

Install optional dependencies by specifying them in square brackets after the package name, e.g.:

pip install h5RDMtoolbox[mongodb]

[mongodb]

[csv]

[snt]

Citing the package

If you intend to use the package in your work, you may cite the paper in the journal inggrid

Here's the bibtext to it:

@article{probst2023h5rdmtoolbox,
  title={h5RDMtoolbox-A Python Toolbox for FAIR Data Management around HDF5},
  author={Probst, Matthias and Pritz, Balazs},
  year={2023},
  publisher={ing. grid Preprint Repository}
}

Contribution

Feel free to contribute. Make sure to write docstrings to your methods and classes and please write tests and use PEP 8 (https://peps.python.org/pep-0008/)

Please write tests for your code and put them into the test/ folder. Visit the README file in the test-folder for more information.

Pleas also add a jupyter notebook in the docs/ folder in order to document your code. Please visit the README file in the docs-folder for more information on how to compile the documentation.

Please use the numpy style for the docstrings: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html#example-numpy