pwollstadt / IDTxl

The Information Dynamics Toolkit xl (IDTxl) is a comprehensive software package for efficient inference of networks and their node dynamics from multivariate time series data using information theory.
http://pwollstadt.github.io/IDTxl/
GNU General Public License v3.0
243 stars 77 forks source link

MPI support and new Kraskov Estimators #104

Closed daehrlich closed 10 months ago

daehrlich commented 10 months ago

This Pull request introduces MPI parallelization capabilities and new Kraskov Estimators to IDTxl

1) MPI Support

Added support for MPI parallelization for serial CPU estimators.

Requires mpi4py>=3.0.0

To use MPI, all one needs to do is add a MPI=True flag in the Estimator settings dictionary (one can optionally provide a n_workers argument) and start the script using

mpiexec -n 1 -usize <max workers + 1> python

on systems with MPI>=2.0 or

mpiexec -n <max workers + 1> python -m mpi4py.futures

on legacy MPI 1 implementations.

Internally, added class MPIEstimator, which is a decorator for arbitrary serial (is_parallel()==False) Estimators sending chunks of data to individual Estimator instances on MPI worker ranks using mpi4py.futures.

Limitations:

MPIEstimator does not yet properly work with Estimators which are already parallel themselves GPU support has not yet been implemented / tested Auxiliary changes:

Strictly call Estimator.estimate with kwargs only (as the abstract method implies) Replaced find_estimator with get_estimator factory method to support automatic MPI decoration Added some unit tests for MPI

2) Python Estimators

Added class PythonKraskovCMI for a python implementation of the Kraskov-Stoegbauer-Grassberger estimator for continuous conditional mutual information.

The estimators are designed as plug-in replacements for the jidt implementations with the one caveat that the "theiler_t" argument is not yet supported. If "theiler_t" is provided to the estimators_python, an exception will be raised.

For the estimators, the user can select between different backends for the k-nearest-neighbor search. Currently implemented are KDTree (both from scipy and sklearn) and BallTree (sklearn).

In preliminary tests, the python estimators with scipy or sklearn KDTree backend outperform the jidt implementations, especially on small datasets. Furthermore, they provide the same results up to numerical errors on multivariate gaussian test distributions.

Concurrent changes in the master branch have recently been merged into the feature branch and all relevant tests have been validated to pass. The merge should thus be fast-forward.

mwibral commented 10 months ago

Hi David,

I think the description below is to 'abtsract' for the average user - could you also give an example of how to call and how not to call the Estimator.estimate function?

Thanks, Michael

On Wed, 2023-12-06 at 01:35 -0800, David A. Ehrlich wrote:

Strictly call Estimator.estimate with kwargs only (as the abstract method implies)