Heat is a distributed tensor framework for high performance data analytics.
Heat builds on PyTorch and mpi4py to provide high-performance computing infrastructure for memory-intensive applications within the NumPy/SciPy ecosystem.
With Heat you can:
For a example that highlights the benefits of multi-node parallelism, hardware acceleration, and how easy this can be done with the help of Heat, see, e.g., our blog post on trucated SVD of a 200GB data set.
Check out our coverage tables to see which NumPy, SciPy, scikit-learn functions are already supported.
If you need a functionality that is not yet supported:
Check out our features and the Heat API Reference for a complete list of functionalities.
Go to Quick Start for a quick overview. For more details, see Installation.
You can test your setup by running the heat_test.py
script:
mpirun -n 2 python heat_test.py
It should print something like this:
x is distributed: True
Global DNDarray x: DNDarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=ht.int32, device=cpu:0, split=0)
Global DNDarray x:
Local torch tensor on rank 0 : tensor([0, 1, 2, 3, 4], dtype=torch.int32)
Local torch tensor on rank 1 : tensor([5, 6, 7, 8, 9], dtype=torch.int32)
Check out our Jupyter Notebook Tutorials, choose local
to try things out on your machine, or hpc
if you have access to an HPC system.
The complete documentation of the latest version is always deployed on Read the Docs.
In order to do computations on your GPU(s):
On most HPC-systems you will not be able to install/compile MPI or CUDA/ROCm yourself. Instead, you will most likely need to load a pre-installed MPI and/or CUDA/ROCm module from the module system. Maybe, you will even find PyTorch, h5py, or mpi4py as (part of) such a module. Note that for optimal performance on GPU, you need to usa an MPI library that has been compiled with CUDA/ROCm support (e.g., so-called "CUDA-aware MPI").
Install the latest version with
pip install heat[hdf5,netcdf]
where the part in brackets is a list of optional dependencies. You can omit it, if you do not need HDF5 or NetCDF support.
The conda build includes all dependencies including OpenMPI.
conda install -c conda-forge heat
Go ahead and ask questions on GitHub Discussions. If you found a bug or are missing a feature, then please file a new issue. You can also get in touch with us on Mattermost (sign up with your GitHub credentials). Once you log in, you can introduce yourself on the Town Square
channel.
We welcome contributions from the community, if you want to contribute to Heat, be sure to review the Contribution Guidelines and Resources before getting started!
We use GitHub issues for tracking requests and bugs, please see Discussions for general questions and discussion. You can also get in touch with us on Mattermost (sign up with your GitHub credentials). Once you log in, you can introduce yourself on the Town Square
channel.
If you’re unsure where to start or how your skills fit in, reach out! You can ask us here on GitHub, by leaving a comment on a relevant issue that is already open.
If you are new to contributing to open source, this guide helps explain why, what, and how to get involved.
mpi4py
)Heat is distributed under the MIT license, see our LICENSE file.
Please do mention Heat in your publications if it helped your research. You can cite:
@inproceedings{heat2020,
title={{HeAT -- a Distributed and GPU-accelerated Tensor Framework for Data Analytics}},
author={
Markus Götz and
Charlotte Debus and
Daniel Coquelin and
Kai Krajsek and
Claudia Comito and
Philipp Knechtges and
Björn Hagemeier and
Michael Tarnawa and
Simon Hanselmann and
Martin Siggel and
Achim Basermann and
Achim Streit
},
booktitle={2020 IEEE International Conference on Big Data (Big Data)},
year={2020},
pages={276-287},
month={December},
publisher={IEEE},
doi={10.1109/BigData50022.2020.9378050}
}
Work in progress...
This work is supported by the Helmholtz Association Initiative and Networking Fund under project number ZT-I-0003 and the Helmholtz AI platform grant.
This project has received funding from Google Summer of Code (GSoC) in 2022.
This work is partially carried out under a programme of, and funded by, the European Space Agency. Any view expressed in this repository or related publications can in no way be taken to reflect the official opinion of the European Space Agency.