mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.61k stars 1.3k forks source link

Depend on `h5io` and `pymatreader` by default #12638

Open hoechenberger opened 1 month ago

hoechenberger commented 1 month ago

Hello, currently we have the following variant in pyproject.toml:

# Dependencies for MNE-Python functions that use HDF5 I/O
hdf5 = ["h5io", "pymatreader"]

The HDF5 I/O is for example used for SourceMorph objects. Since this is a "standard" MNE class, I feel we should support I/O by default (i.e., when doing pip install mne).

I suspect the reason why h5io is only included in a special variant is because its dependency, h5py, must be compiled and linked, which used to be a problem on many platforms (one of the reason why Anaconda became so popular, as they provided binaries early on). Now with the availability of binary wheels, this problem has basically disappeared. The developers provide wheels for all relevant platforms except for aarch64, but @larsoner is working on this.

I would be in favor of moving the dependencies that are currently under the hdf5 variant into our default dependency stack once binary wheels of h5py for aarch64 become available and h5io can hence be installed from PyPI without requiring any compilation. The hdf5 variant selector can remain in pyproject.toml for backward-compat and just remain empty (we already do this for data).

Thoughts?

cbrnr commented 1 month ago

Good idea, 👍!

drammock commented 1 month ago

+1

larsoner commented 1 month ago

It's not just about wheel availability unfortunately. We had related discussions where we ultimately rejected including the pure Python library nibabel in #11564 for different reasons. (As a side note, to create a SourceMorph in the first place you need to use nibabel, so that's probably not a compelling use case for adding just hdf5.)

There's an argument to be made that hdf5 is more core/important than nibabel (e.g., also needed for EEGLAB and some other IO), but not sure if that outweighs other concerns from #11564.

Above and beyond the concerns blocked nibabel, the compiled nature of hdf5 makes me much more hesitant to add it compared to a pure Python module. Support for compiled libraries always adds complexity, and HDF5 isn't the simplest compiled dependency, for example because it requires its own linked libhdf5 (unlike, say, dipy which just compiles platform specific C code from Cython files). And right now any platform that supports NumPy SciPy and Matplotlib (the remaining core deps are pure Python I think) can run MNE. Not sure the limited potential gain worth adding h5py to that hard requirement list.