Unable to save SpectralConnectivity when it is averaged for frequency bands

weiglszonja commented 2 years ago

Describe the bug

Computing spectral connectivity for frequency bands gives TypeError when trying to save the object.

pip show mne_connectivity
Name: mne-connectivity
Version: 0.3
Summary: A module for connectivity data analysis with MNE.
Home-page: https://github.com/mne-tools/mne-connectivity
Author: None
Author-email: None
License: BSD-3
Location: /Users/weian/.conda/envs/meeg-tools/lib/python3.7/site-packages

Steps to reproduce

import os
import mne
from mne_connectivity import spectral_connectivity_epochs

sample_data_folder = mne.datasets.sample.data_path()
sample_data_raw_file = os.path.join(sample_data_folder, 'MEG', 'sample',
                                    'sample_audvis_raw.fif')
raw = mne.io.read_raw_fif(sample_data_raw_file, verbose=False, preload=False)
events = mne.find_events(raw, stim_channel='STI 014')
event_dict = {'auditory/left': 1}
epochs = mne.Epochs(raw, events, event_id=event_dict, preload=False,
                    proj=False)
epochs = epochs[:3]

conn = spectral_connectivity_epochs(epochs, n_jobs=2, fmin=(4, 8, 13, 30), fmax=(8, 13, 30, 45), faverage=True)
conn.save('foo.nc')

Expected results

save the file

Actual results

/Users/weian/.conda/envs/meeg-tools/lib/python3.7/site-packages/numpy/core/_asarray.py:102: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  return array(a, dtype, copy=False, order=order)
Traceback (most recent call last):
  File "/Users/weian/.conda/envs/meeg-tools/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3441, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-40b8b5da96e4>", line 1, in <module>
    conn.save('foo.nc')
  File "/Users/weian/.conda/envs/meeg-tools/lib/python3.7/site-packages/mne_connectivity/base.py", line 831, in save
    invalid_netcdf=True)
  File "/Users/weian/.conda/envs/meeg-tools/lib/python3.7/site-packages/xarray/core/dataarray.py", line 2844, in to_netcdf
    return dataset.to_netcdf(*args, **kwargs)
  File "/Users/weian/.conda/envs/meeg-tools/lib/python3.7/site-packages/xarray/core/dataset.py", line 1912, in to_netcdf
    invalid_netcdf=invalid_netcdf,
  File "/Users/weian/.conda/envs/meeg-tools/lib/python3.7/site-packages/xarray/backends/api.py", line 1073, in to_netcdf
    dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
  File "/Users/weian/.conda/envs/meeg-tools/lib/python3.7/site-packages/xarray/backends/api.py", line 1119, in dump_to_store
    store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
  File "/Users/weian/.conda/envs/meeg-tools/lib/python3.7/site-packages/xarray/backends/common.py", line 266, in store
    variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
  File "/Users/weian/.conda/envs/meeg-tools/lib/python3.7/site-packages/xarray/backends/common.py", line 304, in set_variables
    name, v, check, unlimited_dims=unlimited_dims
  File "/Users/weian/.conda/envs/meeg-tools/lib/python3.7/site-packages/xarray/backends/h5netcdf_.py", line 326, in prepare_variable
    nc4_var.attrs[k] = v
  File "/Users/weian/.conda/envs/meeg-tools/lib/python3.7/site-packages/h5netcdf/attrs.py", line 83, in __setitem__
    self._h5attrs[key] = value
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/Users/weian/.conda/envs/meeg-tools/lib/python3.7/site-packages/h5py/_hl/attrs.py", line 103, in __setitem__
    self.create(name, data=value)
  File "/Users/weian/.conda/envs/meeg-tools/lib/python3.7/site-packages/h5py/_hl/attrs.py", line 180, in create
    htype = h5t.py_create(original_dtype, logical=True)
  File "h5py/h5t.pyx", line 1629, in h5py.h5t.py_create
  File "h5py/h5t.pyx", line 1653, in h5py.h5t.py_create
  File "h5py/h5t.pyx", line 1713, in h5py.h5t.py_create
TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Additional information

This error is triggered when the freqs_used attribute (conn.attrs['freqs_used']) is not a plain list, but as the example code snippet shows a list of arrays, e.g.

freqs_used = [
array([4.2799168 , 5.70655573, 7.13319466]),
array([ 8.55983359,  9.98647252, 11.41311145, 12.83975039]),
array([14.26638932, 15.69302825, 17.11966718, 18.54630611, 19.97294504, 21.39958398, 22.82622291, 24.25286184, 25.67950077, 27.1061397 , 28.53277863, 29.95941757]),
array([31.3860565 , 32.81269543, 34.23933436, 35.66597329, 37.09261222, 38.51925116, 39.94589009, 41.37252902, 42.79916795, 44.22580688])
]

Any help would be much appreciated! Thank you :)

adam2392 commented 2 years ago

Will take a look towards the end of the week!

adam2392 commented 2 years ago

Yeah I suspect it's defaulting to an array rather then list of lists which isn't supported in HDF.

weiglszonja commented 2 years ago

Thanks @adam2392 for looking into this :) I was thinking of a solution to just flatten the list lists, which could be done here:

https://github.com/mne-tools/mne-connectivity/blob/5f36fa61c29e40411029485cf8abfcd3c5322591/mne_connectivity/spectral/epochs.py#L1106-L1110

What do you think about this?

    if faverage:
        # for each band we return the frequencies that were averaged
        freqs = [np.mean(x) for x in freqs_bands]
        if all(isinstance(freq_band, Iterable) for freq_band in freqs_bands):
            freqs_used = [freq for freq_band in freqs_bands for freq in freq_band]
        else:
            freqs_used = freqs_bands

adam2392 commented 2 years ago

Sure that seems reasonable to me! Wanna try it out and PR with an additional test case? I can help guide where to do it if needed!

I don't think there's a case where Freqs used would be more then a 2 layer nested list.

weiglszonja commented 2 years ago

Sure, thank you! I'll give it a try :)

adam2392 commented 2 years ago

Hi @weiglszonja how're things going w/ this? Need any assistance? No rush! Just wanted to make sure I can answer any questions.

weiglszonja commented 2 years ago

Hey, I was testing that solution I proposed (and then went on vacation), and realised when faverage=True then freqs_bands is always a list of list, so the if/else branching is not necessary. And what I was trying to do here, to flatten the list is list is actually identical to what is originally in freqs, so instead of all that I proposed above, now I would only have this:

    freqs_used = freqs 
    if faverage:
        # for each band we return the frequencies that were averaged
        freqs = [np.mean(x) for x in freqs_bands]

I managed to write a unittest, and test that file writing/reading succeeds, however this change will break other tests, and I'm not sure how to proceed.

I can open a PR so you can look at it.

adam2392 commented 2 years ago

Ah I see, so the underlying problem after inspection is that h5netcdf doesn't support saving list of lists with unequal lengths. Without faverage, freqs_used is equivalent to freqs, because no post-processing is done on the frequency level. With faverage, this raises the issue.

The question then is what should be saved inside freqs_used in the faverage case, which is just a metadata to remember which frequencies were used to compute the spectral connectivity.

I'm inclined to say that we need to store perhaps the endpoints of each frequency band(?) unfortunately losing the information of which frequency points were used in the computation. This would fix your problem, and make things a bit more future-proof.

adam2392 commented 2 years ago

You can modify the epochs.py lines to look like this:

    freqs_used = freqs
    if faverage:
        # for each band we return the frequencies that were averaged
        freqs = [np.mean(x) for x in freqs_bands]

        # make sure freq_bands is a list of equal-length lists
        # XXX: we lose information on which frequency points went into the
        # computation. If h5netcdf supports numpy objects in the future, then
        # we can change the min/max to just make it a list of lists.
        freqs_used = freqs_bands
        freqs_used = [[np.min(band), np.max(band)] for band in freqs_used]

Then there are a few lines in the test test_spectral_connectivity that we can alter. I can help you here on what to modify.

weiglszonja commented 2 years ago

@adam2392 I'm checking this test_spectral_connectivity and I'm not sure how to fix when this test tries to manually average connectivity, but we don't know the frequency points anymore:

https://github.com/mne-tools/mne-connectivity/blob/f819d0fc21f6c93a8dcfb3dd0163227168fffb4b/mne_connectivity/spectral/tests/test_spectral.py#L360-L365

adam2392 commented 2 years ago

Hi @weiglszonja if you open a PR, I can take a look at your explicit code. Doesn't need to work completely yet :p.

Also, I realized the issue becomes more involved since the frequency lists are not all of equal length, please take my code in https://github.com/mne-tools/mne-connectivity/pull/91 and adapt it for your PR.

weiglszonja commented 2 years ago

Ah, sorry @adam2392, I didn't see #91 before opening my PR... I can close it :) In our use-case both are working!

adam2392 commented 2 years ago

@weiglszonja well you, the user, deserve credit for identifying the issue! Okay glad to hear it is working.

Then in that case I will add you via a co-author commit and you to the changelog entry. Thanks!

weiglszonja commented 2 years ago

Thank you for helping with this issue! 👍

mne-tools / mne-connectivity