Closed daehrlich closed 10 months ago
Hi David,
I think the description below is to 'abtsract' for the average user - could you also give an example of how to call and how not to call the Estimator.estimate function?
Thanks, Michael
On Wed, 2023-12-06 at 01:35 -0800, David A. Ehrlich wrote:
Strictly call Estimator.estimate with kwargs only (as the abstract method implies)
This Pull request introduces MPI parallelization capabilities and new Kraskov Estimators to IDTxl
1) MPI Support
Added support for MPI parallelization for serial CPU estimators.
Requires mpi4py>=3.0.0
To use MPI, all one needs to do is add a MPI=True flag in the Estimator settings dictionary (one can optionally provide a n_workers argument) and start the script using
mpiexec -n 1 -usize <max workers + 1> python
on systems with MPI>=2.0 or
mpiexec -n <max workers + 1> python -m mpi4py.futures
on legacy MPI 1 implementations.
Internally, added class MPIEstimator, which is a decorator for arbitrary serial (is_parallel()==False) Estimators sending chunks of data to individual Estimator instances on MPI worker ranks using mpi4py.futures.
Limitations:
MPIEstimator does not yet properly work with Estimators which are already parallel themselves GPU support has not yet been implemented / tested Auxiliary changes:
Strictly call Estimator.estimate with kwargs only (as the abstract method implies) Replaced find_estimator with get_estimator factory method to support automatic MPI decoration Added some unit tests for MPI
2) Python Estimators
Added class PythonKraskovCMI for a python implementation of the Kraskov-Stoegbauer-Grassberger estimator for continuous conditional mutual information.
The estimators are designed as plug-in replacements for the jidt implementations with the one caveat that the "theiler_t" argument is not yet supported. If "theiler_t" is provided to the estimators_python, an exception will be raised.
For the estimators, the user can select between different backends for the k-nearest-neighbor search. Currently implemented are KDTree (both from scipy and sklearn) and BallTree (sklearn).
In preliminary tests, the python estimators with scipy or sklearn KDTree backend outperform the jidt implementations, especially on small datasets. Furthermore, they provide the same results up to numerical errors on multivariate gaussian test distributions.
Concurrent changes in the master branch have recently been merged into the feature branch and all relevant tests have been validated to pass. The merge should thus be fast-forward.