Store `n_cycles` and `time_bandwidth` params in `*TFR` objects

mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python

https://mne.tools

BSD 3-Clause "New" or "Revised" License

2.72k stars 1.32k forks source link

Store `n_cycles` and `time_bandwidth` params in `*TFR` objects #12851

Open tsbinns opened 1 month ago

tsbinns commented 1 month ago

Describe the new feature or enhancement

Discussed partly with @larsoner.

MNE-Connectivity is being updated to allow connectivity computations using complex coefficients stored in EpochsTFR objects (https://github.com/mne-tools/mne-connectivity/pull/232). For the multitaper method, TFR coefficients have a tapers dimension that need to be aggregated over when computing the CSD. These weights are not stored in the *TFR objects, and computing these weights requires knowledge of the time_bandwidth and n_cycles used to estimate the TFR. However, currently these parameters are not stored in *TFR objects.

If we want to accurately compute these weights in MNE-Connectivity, we would currently require the user to pass these parameters in addition to the EpochsTFR object. It would be more convenient if this information could also be read from the EpochsTFR object.

Describe your proposed implementation

There were two ideas:

Either store this information directly as attributes, e.g., tfr.n_cycles and tfr.time_bandwidth
Or store these params as a dict to minimise clutter in the *TFR namespace, e.g., tfr.estimation_parameters = {'mt_bandwidth': 4, 'n_cycles': 5}

Describe possible alternatives

An alternative would be to just store the taper weights in the *TFR objects, which is similar to how this is now handled for *Spectrum objects. However, the weights are not returned by tfr_array_multitaper, so they would still need to be recomputed.

It's perhaps simpler if we just handle this in MNE-Connectivity using the stored estimation params.

What do people think is the best choice?

drammock commented 1 month ago

An alternative would be to just store the taper weights in the *TFR objects, which is similar to how this is now handled for *Spectrum objects. However, the weights are not returned by tfr_array_multitaper, so they would still need to be recomputed.

Since we already store weights for spectrum objects, I'm inclined to store weights (not estimation params) for TFRs too. That saves having to recompute them when users pass MNE objects (not arrays) to the connectivity functions, and is more internally consistent. (this assumes that we don't anticipate needing n_cycles or mt_bandwidth for other purposes... I suppose to be maximally future-proof we could store the weights and the estimation params)

When users pass NumPy arrays to the conn functions, they can/should expect that they'll need to provide some extra information, so IMO it's fine to ask them to pass in estimation params (n_cycles and mt_bandwidth) and we calc the weights for them at that time... even though it's a bit error-prone to trust that the user passes the same values to both the TFR function and the connectivity function.

Another option would be adding return_weights=False to tfr_array_multitaper, so that folks can use the array-based connectivity API as long as they're willing to re-compute the TFR with return_weights=True first. That feels slightly worse to me than just asking them to pass in the estimation params.

tsbinns commented 1 month ago

Since we already store weights for spectrum objects, I'm inclined to store weights (not estimation params) for TFRs too. That saves having to recompute them when users pass MNE objects (not arrays) to the connectivity functions, and is more internally consistent.

Sure, I see your reasoning, especially to keep it consistent with *Spectrum objects. I'll open a PR for this.

When users pass NumPy arrays to the conn functions, they can/should expect that they'll need to provide some extra information, so IMO it's fine to ask them to pass in estimation params (n_cycles and mt_bandwidth) and we calc the weights for them at that time... even though it's a bit error-prone to trust that the user passes the same values to both the TFR function and the connectivity function.

Just to clarify, for the proposed connectivity function changes, users would still only be able to pass in arrays of epoched timeseries data, not epoched coefficients. If we change to store weights in *TFR objects, then n_cycles and mt_bandwidth params could be completely ignored for *TFR data, but for Epochs and arrays of epoched timeseries data they would still function in the same way (to determine the TFR estimation).

drammock commented 1 month ago

Just to clarify, for the proposed connectivity function changes, users would still only be able to pass in arrays of epoched timeseries data, not epoched coefficients. If we change to store weights in *TFR objects, then n_cycles and mt_bandwidth params could be completely ignored for *TFR data, but for Epochs and arrays of epoched timeseries data they would still function in the same way (to determine the TFR estimation).

Ah OK, so it's for any non-spectral/spectrotemporal data (whether in Epochs object form or array form) where they need to pass the estimation params. That makes sense.

tsbinns commented 1 month ago

Ah OK, so it's for any non-spectral/spectrotemporal data (whether in Epochs object form or array form) where they need to pass the estimation params. That makes sense.

Yeah, exactly.

tsbinns commented 1 month ago

Since we already store weights for spectrum objects, I'm inclined to store weights (not estimation params) for TFRs too. That saves having to recompute them when users pass MNE objects (not arrays) to the connectivity functions, and is more internally consistent.

Had a look into this and have a question about the implementation:

When Raw/Epochs.compute_tfr(method="multitaper") is called, what happens internally is essentially tfr_array_multitaper() -> _compute_tfr() -> _make_dpss(). The taper weights are computed in _make_dpss(), which already has a toggle to return the weights.

However, _compute_tfr() currently does not return weights, and more importantly neither does tfr_array_multitaper(). So if we wanted to return the weights computed in _make_dpss() to the *TFR object, a new param would be needed in tfr_array_multitaper(). This was mentioned earlier in a slightly different context:

Another option would be adding return_weights=False to tfr_array_multitaper

Is a public API change like this acceptable?

If not, then the weights would need to be recomputed in the *TFR object. It's not the most efficient approach, but the compute cost isn't huge.

drammock commented 1 month ago

So if we wanted to return the weights computed in _make_dpss() to the *TFR object, a new param would be needed in tfr_array_multitaper().

I find this acceptable. The tfr_array_*() functions are sort of like "expert mode" now that there are .compute_tfr() methods on objects. I don't see much harm/danger in adding a return_weights param to it. @larsoner WDYT? The alternatives are

recompute weights later (feels icky, but as @tsbinns says, the compute cost isn't huge)
do some kind of larger refactor so that the .compute_tfr() methods don't call the public tfr_array_* APIs, but instead call directly to _compute_tfr().

tsbinns commented 4 weeks ago

Sorry for not opening a PR on this yet, things ended up being a little more complicated. Making the changes so that weights are available to be stored in TFR objects was fine, but when looking into the code I also realised that the TFR plotting methods (plot, plot_topo, and plot_topomap) cannot handle cases where a tapers dimension is present (raises an error related to data shape), and also that to_data_frame does not account for the possibility of a tapers dimension.

Fixing to_data_frame is easy, and for the plotting functions the equivalent issue was addressed for Spectrum objects in #12747 (aggregate over tapers and convert to power before plotting; currently the conversion to power happens, but there is no aggregation).

However, since this is a distinct thing to adding a weights attr, I wasn't sure if people would prefer these in separate PRs, or should I handle all of this at once?