[DOC] Example for spectral connectivity methods comparing epochs and time

Div12345 commented 2 years ago

PR Description

Closes #70 . Example showing difference between spectral_connectivity_epochs and spectral_connectivity_time

Merge checklist

Maintainer, please confirm the following before merging:

[ ] All comments resolved
[ ] This is not your own PR
[ ] All CIs are happy
[ ] PR title starts with [MRG]
[ ] whats_new.rst is updated
[ ] PR description includes phrase "closes <#issue-number>"

Div12345 commented 2 years ago

I've added an example comparing spectral_connectivity_epochs and time using simulated data and coherence here. Haven't yet cleared out some of the testing code.

Could have a short part showing what happens when there is only a single epoch - It seems like all the inter-channel connectivity comes out to be 1.
Also while using get_data on the epoch method, the raveled option returns n_channels x n_channels, while the time method returns n_channels x (n_channels-1). The latter seems to be more right to me since it just gives the necessary combinations if I understand correctly.
I think there should be a method similar to raw.plot() for easily visualizing the connectivity across epoch, frequencies and time.
Also @adam2392 can you tell me at what level the "estimation" of the connectivity measure is happening in the epochs method? Is it after the connectivity measure is calculated from each epoch? Is the estimation something like averaging over time for each connectivity measure at this level? Or is it something else totally?

TODO -

[ ] Clearing out unnecessary code and adding more text explanation at some levels
[ ] Could add PLI example on simulated data (using sine of shifted phase) - But this could be time-specific or method-specific example itself rather than in this comparison example
[ ] Example using real-data - ERP for epochs and other for time
[ ] Correcting doc on size of return of spectral_connectivity_time method
[ ] Correcting working for specified connectivity rather than all-to-all

adam2392 commented 2 years ago

@Div12345 apologies for the delay. It is perhaps worth tabling this for now and actually it seems that there is some discrepancy as to if the implementation I ported over from Frites for time-resolved spectral_connectivity is correct or not.

Would you be up for perhaps taking a look at https://github.com/mne-tools/mne-connectivity/issues/70#issuecomment-1012241671 and comparing the implementation there vs what we have currently? I think @avoide raised a good point in https://github.com/mne-tools/mne-connectivity/issues/17#issuecomment-1011242482, which makes me suspect that either Frites, or I implemented time-resolved spectral_connectivity incorrectly.

Thankfully, @avoide has a working example, so if it makes sense, we can replace his implementation with ours in a PR. We just have to fashion it similarly to the existing mne-connectivity API.

Once this is all sorted out, it makes sense to revisit the example started here. I can perhaps get to it in a few weeks otw.

Div12345 commented 2 years ago

Hey @adam2392 , no issues!

I can check once again.. I have been having some doubts about the indices parameter as well. I tried using specified channel indices to calculate the connectivity on some real data, but it seems to be giving some issues. Something about the estimated n_nodes being a certain number but the calculated being n_indices (the number of connections I wanted it to calculate).

Is this hardcoding here of the indices='symmetric' correct?

I'll raise another issue with all the exact errors I get and the debugging I did as well.

There are some simple implementations of the PLV and the coh over on the mne-features library as well - PLV function

Div12345 commented 2 years ago

@adam2392 While I didn't look specifically into the exact working, I did further testing with the methods. These are my testings for the single epoch over time methods -

Coherence tests -

rng = np.random.RandomState(0)
x1 = rng.rand(1, 1, n_samples)
x2 = np.sin(2 * np.pi * 10 * time)
x3 = np.sin(2 * np.pi * 55 * time) # some non-multiple

data = np.zeros((n_epochs, n_channels, n_samples))
data[0, 0, :] = x1
data[0, 1, :] = x2
data[0, 2, :] = x2  # x3

# Different values in different epochs
data[1, 0, :] = x1
data[1, 1, :] = x2
data[1, 2, :] = x3

Epoch 1 coherence -

Epoch 2 coherence -

I further did tests for PLV -

This is for

rng = np.random.RandomState(42)
x1 = rng.rand(1, 1, n_samples)
x2 = np.sin(2 * np.pi * 10 * time + 0)
x3 = np.sin(2 * np.pi * 10 * time + 0.05 * np.pi) # same for without phase diff also

This is for

rng = np.random.RandomState(42)
x1 = rng.rand(1, 1, n_samples)
x2 = np.sin(2 * np.pi * 10 * time + 0)
x3 = np.sin(2 * np.pi * 100 * time )

Seems to me like I'm getting the expected results.

adam2392 commented 2 years ago

This is using avoides implementation?

Div12345 commented 2 years ago

No, this is using what is integrated from frites i.e. spectral_connectivity_time

Div12345 commented 2 years ago

Hey @adam2392 , had a chance to go through that? Or let me know if you think it's still better to check the code once.. @avoide could you have a look too maybe.. I'll check once again about the issue you raised as well..

Avoide commented 2 years ago

Hi @Div12345, I tried your code and also looked at your images. Is it correctly understood that in your 10 and 10Hz with Phase diff plot, x1-x1, x2-x2, x2-x3 and x3-x3 show the PLV = 1 as expected, but for some reason the x1-x2 and x1-x3 are the green line, which overlaps with the yellow one? If that is so, then I find this PLV with random generated data extremely high.

I used:

# Simulating data
rng = np.random.RandomState(0)
x1 = rng.rand(1, 1, n_samples)
x2 = np.sin(2*np.pi*10*time)
x3 = np.sin(2*np.pi*10*time+0.20)

data = np.zeros((n_epochs,n_channels,n_samples))
data[0,0,:] = x1
data[0,1,:] = x2
data[0,2,:] = x3

# # Same value for each channel in all the epochs
# for i in range(n_epochs):
#     data[i] = data[0]

# Different values in different epochs
data[1,0,:] = x1
data[1,1,:] = x2
data[1,2,:] = x3

Figure_1

And on average the PLV over time was 0.82 between a 10Hz sinus wave and random noise! This seems a bit too high.

One problem I mentioned in my latest comment in #17 is that randomly generated signals showed abnormally high PLV, which can also be observed using your code:

# Simulating data
rng = np.random.RandomState(0)
x1 = rng.rand(1, 1, n_samples)
x2 = rng.rand(1, 1, n_samples)
x3 = rng.rand(1, 1, n_samples)

Figure_2

Here on average across the epoch the PLV matrix is: [0.98156652 0.696885 0.77922573] [0.696885 0.98156652 0.7089454 ] [0.77922573 0.7089454 0.98156652] Thus around 0.7 to 0.78! For randomly generated data we would expect 0, although of course due to the finite nature and wavelets/filters etc we would expect some phase locking due to chance but not this high I think.

Using my PLV implementation I got: [0. , 0. , 0. ], [0.25187146, 0. , 0. ], [0.1383034 , 0.2419822 , 0. ]] Which seemed more reasonable. Bear in mind this was only one simulation, in #17 I repeated it 100 times and got:

PLV using spectral_connectivity_time over 100 runs with randomly generated data [[0.98156652 0.64840676 0.65101158] [0.64840676 0.98156652 0.64902626] [0.65101158 0.64902626 0.98156652]] PLV using my implementation over 100 runs with randomly generated data [[0. 0. 0. ] [0.14877284 0. 0. ] [0.15452367 0.15971872 0. ]] Which raised some flags concerning spectral_connectivity_time.

adam2392 commented 2 years ago

I'm inclined to believe there are issues here simply based off of the simulation that @avoide proposes. It seems weird to me that the spectral_connectivity_time does not work for such a simple simulation, so I'm concerned perhaps the Frites implementation either has a bug, or there are some caveats to how it is used. I'm 95% sure my implementation is a correct copy as I unit test against their implementation.

Some potential ways forward are:

If we can figure out the difference in implementation that would be ideal, but I can see that might take some work.
We rewrite spectral_connectivity_time with @Avoide 's implementation and move forward with that one.

Pinging @EtienneCmb here for some thoughts.

EtienneCmb commented 2 years ago

Thanks @adam2392, I'll try to reproduce @Avoide results tomorrow.

adam2392 commented 2 years ago

Thanks @adam2392, I'll try to reproduce @Avoide results tomorrow.

Hi @EtienneCmb any luck on this?

EtienneCmb commented 2 years ago

Hi @adam2392 and @Avoide,

I was able to reproduce @Avoide results, with high PLV values even for random data. I also tested the code for different temporal smoothing (default is using morlet's wavelets with a fixed 7 cycles) and PLV baseline values are decreasing with longer kernels (same for coherence). I guess that the longer the data, the better the estimation.

import numpy as np
import xarray as xr

from frites.conn import conn_spec

import matplotlib.pyplot as plt

sfreq = 64.
n_samples = 5000
times = np.arange(n_samples) / sfreq
freqs = np.linspace(2., 20, 50)

# generate random data
rng = np.random.RandomState(0)
x = xr.DataArray(
    rng.rand(1, 3, n_samples), dims=('trials', 'roi', 'times'),
    coords=([0], ['x1', 'x2', 'x3'], times)
)

# compute plv for different temporal smoothings
plv = {}
for sm_times in [.5, 1., 2., 5., 10., 20., 30.]:
    plv[sm_times] = conn_spec(
        x, roi='roi', times='times', freqs=freqs, metric='plv', sfreq=sfreq,
        n_jobs=1, sm_times=sm_times, verbose=False
    ).squeeze().mean(('times', 'roi'))
plv = xr.Dataset(plv).to_array('sm_times')

# plot the results
plt.figure(figsize=(12, 8))
plv.plot(x='freqs', hue='sm_times')
plt.xlabel('Frequencies'), plt.ylabel('PLV'), plt.ylim(0, 1)
plt.title('PLV for different sm_times', fontweight='bold');

download

If you add a 10hz pure sine to the random data, longer kernels provide a better ratio between the PLV at 10Hz and the noise :

# generate random data
rng = np.random.RandomState(0)
x = xr.DataArray(
    rng.rand(1, 3, n_samples), dims=('trials', 'roi', 'times'),
    coords=([0], ['x1', 'x2', 'x3'], times)
)
x.data += .3 * np.sin(2 * np.pi * 10 * times).reshape(1, 1, -1)

download

There are also differences between morlet and multi-tapers. With the latter, I got better results

# compute plv for different temporal smoothings
plv = {}
for sm_times in [.5, 1., 2., 5., 10., 20., 30.]:
    plv[sm_times] = conn_spec(
        x, roi='roi', times='times', freqs=freqs, metric='plv', sfreq=sfreq,
        n_jobs=1, sm_times=sm_times, verbose=False, mode='multitaper',
        n_cycles=freqs / 2, mt_bandwidth=10
    ).squeeze().mean(('times', 'roi'))
plv = xr.Dataset(plv).to_array('sm_times')

download

I don't know if it is an implementation issue or if it is an inherent limitation of measuring the COH / PLV over time. @ViniciusLima94, any opinion on this ?

ViniciusLima94 commented 2 years ago

Hello @EtienneCmb , @adam2392 and @Avoide .

Concerning the bias of the metric, you can refer to Lachaux et. al. 2002

In summary, it depends on the number of cycles and on the smoothing window (see image above). Note that it is possible to make the bias independent of the frequency by defining $n_{cycles} = \alpha * freqs$.

I have a few examples in the following notebook: https://github.com/ViniciusLima94/GrayData-Analysis/blob/master/notebooks/3.0.9.%20Lachaux%20wavelet%20coherence.ipynb

Notably, below I show the bias computed analytically with the coherence computed for white noise signals as a function of the temporal smoothing kernel ($w_t$ in seconds):

Also, for $n_cycles = constant$ you can see that the bias is frequency dependent:

While for $n_{cycles} = \alpha * freqs$ no:

Let me know if I can further help.

EtienneCmb commented 2 years ago

Also, @Avoide correct me if I'm wrong, but your implementation is a single-trial & over-time PLV implementation. The one in Frites is also a single-trial implementation, computed over time but using kernel smoothing to retrieve the "temporal" dynamic of the PLV (modulo kernel length). I guess it would be like using your implementation on sliding windows. This should be take into consideration when comparing the results of both methods imo.

Avoide commented 2 years ago

Also, @Avoide correct me if I'm wrong, but your implementation is a single-trial & over-time PLV implementation. The one in Frites is also a single-trial implementation, computed over time but using kernel smoothing to retrieve the "temporal" dynamic of the PLV (modulo kernel length). I guess it would be like using your implementation on sliding windows. This should be take into consideration when comparing the results of both methods imo.

Yes that is correct. The implementation I posted is based on the epochs, thus it correspond to using non-overlapping windows.

adam2392 commented 2 years ago

Sorry I've been swamped, so lost track of this.

Is there a summary then of the bug that needs to be fixed? Is @Avoide 's implementation correct, or the Frites version, or both?

ruuskas commented 1 year ago

This is now addressed by #104, if the contributors of this discussion would like to have a look and comment.

ruuskas commented 1 year ago

@EtienneCmb I believe the reason why the PLV values appear seemingly too large in the example you posted above is an error in the Frites implementation of PLV. I might be mistaken, so please correct me if there's a misconception in my thinking instead.

This is your PLV implementation in Frites:

def pairwise_plv(w_x, w_y):
        # computes the plv
        s_xy = w[:, w_y, :, :] * np.conj(w[:, w_x, :, :])
        # complex exponential of phase differences
        exp_dphi = s_xy / np.abs(s_xy)
        # smooth e^(-i*\delta\phi)
        exp_dphi = _smooth_spectra(exp_dphi, kernel)
        # computes plv
        out = np.abs(exp_dphi)
        # mean inside frequency sliding window (if needed)
        if isinstance(foi_idx, np.ndarray):
            return _foi_average(out, foi_idx)
        else:
            return out

In the above example you provided, the output array is averaged over the time axis to obtain a single scalar PLV value for each frequency point.

In mathematical notation, this corresponds to: $$PLV{x, y}=\frac{1}{T}\sum{t=1}^T|\tilde{S}_{xy}|$$

where $\tilde{S}_{xy}$ is the normalized and smoothed cross spectrum of signals $x$ and $y$.

However, the definition of time-resolved PLV is: $$PLV{x,y}=\frac{1}{T}|\sum{t=1}^T\tilde{S}_{xy}|$$

Taking the absolute value before the average leads to too large values, since in general $|a+b| \le |a|+|b|$.

Furthermore, let's look at this part of the code (edited the comments for readability):

exp_dphi = s_xy / np.abs(s_xy)  # complex exponential of phase differences
exp_dphi = _smooth_spectra(exp_dphi, kernel)  # smooth e^(-i*\delta\phi)
out = np.abs(exp_dphi) # computes plv

Now again, I might be mistaken, but to me it seems that we have exp_dphi = s_xy / np.abs(s_xy), so exp_dphi is now an array full of normalized complex numbers. If we were to skip the smoothing step, we would have out = np.abs(exp_dphi), so out is now an array that contains the absolute values of normalized complex numbers.

In other words, without the smoothing step, out is now an array full of ones. I don't know if this is by design, but in any case something of note.

To verify this, I quickly commented out the smoothing, and indeed: plv_uncoupled_no_smoothing

Computing the average first and then taking the absolute value leads to much more reasonable PLV scores for uncoupled signals. Code:

def pairwise_plv(w_x, w_y):
        # computes the plv
        s_xy = w[:, w_y, :, :] * np.conj(w[:, w_x, :, :])
        # complex exponential of phase differences
        exp_dphi = s_xy / np.abs(s_xy)
        # smooth e^(-i*\delta\phi)
        exp_dphi = _smooth_spectra(exp_dphi, kernel)
        # computes plv
        out = np.abs(np.mean(exp_dphi, axis=-1, keepdims=True))
        # mean inside frequency sliding window (if needed)
        if isinstance(foi_idx, np.ndarray):
            return _foi_average(out, foi_idx)
        else:
            return out

Random data: plv_uncoupled 10Hz coupling: plv_coupled

I believe the reason that we don't see a PLV value of exactly one for 10Hz is that Frites takes the average of the spectra produced by mne.time_frequency.tfr_array_multitaper. Instead of taking the average, it seems that connectivity scores should be computed first and then averaged. This is discussed in #84 and solved in #104.

I hope this helps. Let me know if I'm misunderstanding something. Also a note: The latest MNE-Python version 1.2.0 breaks Frites due to the removal of _get_args.

EtienneCmb commented 1 year ago

Hi @ruuskas,

Thank you for the clear explanations, I'll also fix the issue in Frites. @ViniciusLima94, if you've time, could you take a look at it also?

The latest MNE-Python version 1.2.0 breaks Frites due to the removal of _get_args.

Yes, thank you, I fixed it recently.

ViniciusLima94 commented 1 year ago

Hello @ruuskas,

Indeed, when estimating PLV at single-trial level, removing the smoothing leads to 1 valued PLV for all frequency and time points.

The smoothing in this case should be equivalent to using small integrating windows. This procedure introduces a bias in the metric (for this I refer Lachaux et. al. 2002, I also sent a reply with an explanation of this above), leading to high values. Taking the average over time also eliminates this bias, however, we do not get a time-resolved metric.

What I usually do, is create surrogate measurements to subtract from the original one (for instance take the value of the 95% over the surrogate distribution).

For instance bellow I show the trial averaged TF map of PLV for the corrected and uncorrected values:

ruuskas commented 1 year ago

Hi @ViniciusLima94.

Thanks for the explanation. What is the interpretation of the trial averaged corrected PLV you show in the left figure? Would this type of estimation be used for resting-state or stimulus data? My understanding would be that looking at the time-resolved PLV produced by averaging over trials would only be sensible if the trials are generated in a meaningful way (for example, stimulus onsets).

Could you clarify the process of going from the uncorrected to the corrected PLV? I understood that you would create surrogates of the original time-course, then compute the PLVs, and only accept those values that are higher than the 95th percentile of the surrogate distribution. Is that correct?

ViniciusLima94 commented 1 year ago

Hello @ruuskas,

In the example above I sent the trial averaging because since it is an example model the trials have all the same structure. Usually, we are interested in single-trial estimation, so we don't do this averaging.

You described it right, usually, I do it like this. Depending on how you do you can generate chance levels for each (time, frequency) or (time, frequency, trial) point.

ruuskas commented 1 year ago

@ViniciusLima94 I understand. What sort of data do you normally use? I'm just curious because when using real data and computing all-to-all connectivity, the computational time is high enough for a single estimation of connectivity. In that scenario, computing e.g. 1000 surrogates would not be practical.

ViniciusLima94 commented 1 year ago

@ruuskas you're right for my case it is not practical at all to have many surrogates. I do a simplification, and assume that thresholds are the same over trials, so what I do is to take the 95% of the distribution across trials, which yields a threshold per (time, frequency) point. Like in the cartoon below:

My data is recorded during a working memory task, I can show you a few single-trial coherence (already corrected) examples:

EtienneCmb commented 1 year ago

@ruuskas do you know the impact of computing connectivity on average PSD over tapers vs. averaging the connectivity metric?

ruuskas commented 1 year ago

@ruuskas do you know the impact of computing connectivity on average PSD over tapers vs. averaging the connectivity metric?

@EtienneCmb it's not a huge effect, unfortunately I fear I don't have an explicit comparison saved.

ruuskas commented 1 year ago

@ViniciusLima94 I see, so it seems that we're trying to tackle a different problem on the conceptual level. My understanding has been that the function spectral_connectivity_time is trying to compute connectivity over time, similar to the way spectral_connectivity_epochs computes connectivity over trials. So this is quite different from the spectrotemporal connectivity in your case.

Did I get it right that in the example you posted no surrogate data was generated, but rather the trials themselves were used as "surrogates"?

adam2392 commented 1 year ago

Hi, all thanks to @ruuskas contributions in https://github.com/mne-tools/mne-connectivity/pull/104, I think the underlying bug/incongruiences that were discovered in this and related issues have been resolved.

@Div12345 not sure if you're still interested because I realize a significant amount of time has passed, but I think it is possible to proceed with an example demonstrating the differences in usage of the two spectral connectivity methods.

mne-tools / mne-connectivity

[DOC] Example for spectral connectivity methods comparing epochs and time #73

PR Description

Merge checklist