Found spindles dramatically reduced after downsampling

raphaelvallat / yasa

YASA (Yet Another Spindle Algorithm): a Python package to analyze polysomnographic sleep recordings.

https://raphaelvallat.com/yasa/

BSD 3-Clause "New" or "Revised" License

428 stars 115 forks source link

Found spindles dramatically reduced after downsampling #54

Closed skjerns closed 2 years ago

skjerns commented 2 years ago

Thanks for creating such a cool tool!

I have some comprehension question: In my intuition, downsampling from 256 to 128 Hz should not reduce the found spindle counts significantly. However in my case it goes from 69 found spindles to 38 (in this case limited to N3).

Is this to be expected? Which results should be regarded as trustful?

256 Hz 128 Hz

Additionally, we have the problem that the spindle count does not sum up when analyzing N2+N3 together and when analyzing N2 and N3 separately. Is this due to different thresholds being computed for different segments? Or due to spindles being found that are on the edge between two windows?

Example:

spindles_NREM = yasa.spindles_detect(raw, hypno=hypno,  include=(2, 3)) 
spindles_N2 = yasa.spindles_detect(raw, hypno=hypno,  include=2) 
spindles_N3 = yasa.spindles_detect(raw, hypno=hypno,  include=3) 

spindles_NREM = 880 events
spindles_N2 = 816 events
spindles_N2 = 38 events

missing = 26 events not included in both

raphaelvallat commented 2 years ago

Hi @skjerns,

Thanks for your feedback :)

Downsampling:

This is extremely worrying and not expected at all. How are you downsampling the data? Are you passing your data as an MNE.Raw obejct or np.array? Could you perhaps try to sequentially disable the thresholds (thresh = {'rel_pow': None, 'corr': 0.65, 'rms': 1.5}) and run the algo with the original and downsampled data? I want to identify if one threshold in particular leads to this issue, i.e. by not properly taking into account the sampling frequency.

Masking

Is this due to different thresholds being computed for different segments?

Yes, the RMS threshold is calculated on the entire masked signal (e.g. N2+N3 vs N2 only vs N3 only) and thus it is expected that the supra-threshold indices (= potential spindles) would not be the same. Setting the rms threshold to None should however give you the same number of spindles because the two other thresholds are not relative to the masked data.

https://github.com/raphaelvallat/yasa/blob/2383628d94001d8288fe834aadfb13c14de6e901/yasa/detection.py#L689-L699

Thanks, Raphael

skjerns commented 2 years ago

Thanks for the quick reply! That's very helpful and good to know :)

Here are the results for the calculation with different thresholds. I'm using a raw object with a bipolar channel Pz-M2.

 sfreq  relp  corr  rms n_events
 '256,  0.2, 0.65,  1.5': 68,
 '256,  0.2, 0.65, None': 72,
 '256,  0.2, None,  1.5': 513,
 '256,  0.2, None, None': 69,
 '256, None, 0.65,  1.5': 519,
 '256, None, 0.65, None': 144,
 '256, None, None,  1.5': 781,
 '128,  0.2, 0.65,  1.5': 37,
 '128,  0.2, 0.65, None': 37,
 '128,  0.2, None,  1.5': 66,
 '128,  0.2, None, None': 68,
 '128, None, 0.65,  1.5': 123,
 '128, None, 0.65, None': 142,
 '128, None, None,  1.5': 757

I'm using the following code:

params = itertools.product([256, 128], [0.2, None], [0.65, None], [1.5, None])
res = {}

for sfreq, rel_pow, corr, rms in tqdm(list(params)):
    raw_res = raw.copy().pick('Pz-M2')
    raw_res.resample(sfreq)
    hypno = yasa.hypno_upsample_to_data(hypnogram, sf_hypno=1/30, data=raw_res)
    try:
        spindles = yasa.spindles_detect(raw_res.pick('Pz-M2'), 
                                             hypno=hypno, 
                                             include=3,
                                             thresh = {'rel_pow': rel_pow, 'corr': corr, 'rms': rms},
                                             verbose='DEBUG') 
        res[f'{sfreq}, {str(rel_pow):>4}, {str(corr):>4}, {str(rms):>4}'] = len(spindles.summary())
    except:
        continue

FYI hypnogram and spectrogram of the participant

raphaelvallat commented 2 years ago

Thanks for running the checks so quickly. I need to do a deep dive on this, but it is quite worrying. One observation from your test is that when using only a single threshold (regardless of which one), the number of spindles is roughly the same with 256 or 128 Hz. Therefore, it is only when combining the thresholds that the sampling rate starts to make a difference 🤔 I have a gut feeling that it may be caused by these lines, but I have to check:

https://github.com/raphaelvallat/yasa/blob/2383628d94001d8288fe834aadfb13c14de6e901/yasa/detection.py#L720-L728

One last thing, do you get the same discrepancy when comparing 100 Hz vs 200 Hz (instead of 128 vs 256)?

Thanks, Raphael

skjerns commented 2 years ago

Might be going in the right direction... Discrepancy is still there when resampling to 250 or 200 or100 Hz

 sfreq  relp  corr  rms n_events
{'250,  0.2, 0.65,  1.5': 67,
 '250,  0.2, 0.65, None': 71,
 '250,  0.2, None,  1.5': 509,
 '250,  0.2, None, None': 70,
 '250, None, 0.65,  1.5': 516,
 '250, None, 0.65, None': 144,
 '250, None, None,  1.5': 782,
 '200,  0.2, 0.65,  1.5': 67,
 '200,  0.2, 0.65, None': 71,
 '200,  0.2, None,  1.5': 507,
 '200,  0.2, None, None': 69,
 '200, None, 0.65,  1.5': 513,
 '200, None, 0.65, None': 141,
 '200, None, None,  1.5': 782,
 '100,  0.2, 0.65,  1.5': 36,
 '100,  0.2, 0.65, None': 36,
 '100,  0.2, None,  1.5': 67,
 '100,  0.2, None, None': 69,
 '100, None, 0.65,  1.5': 115,
 '100, None, 0.65, None': 137,
 '100, None, None,  1.5': 759}

skjerns commented 2 years ago

However, I found something interesting when using different sampling frequencies. There seem to be two attractor states at ~67 and ~510.

Let me know if it would be helpful to receive the original data, I can probably arrange to share it with you privately

sfreq rel_p, corr,  rms 
  '70,  0.2, None,  1.5':  63,
  '75,  0.2, None,  1.5':  64,
  '80,  0.2, None,  1.5':  63,
  '85,  0.2, None,  1.5':  65,
  '90,  0.2, None,  1.5':  502,
  '95,  0.2, None,  1.5':  521,
 '100,  0.2, None,  1.5' : 67,
 '105,  0.2, None,  1.5' : 66,
 '110,  0.2, None,  1.5' : 510,
 '115,  0.2, None,  1.5' : 517,
 '120,  0.2, None,  1.5' : 66,
 '125,  0.2, None,  1.5' : 67,
 '130,  0.2, None,  1.5' : 67,
 '135,  0.2, None,  1.5' : 67,
 '140,  0.2, None,  1.5' : 67,
 '145,  0.2, None,  1.5' : 67,
 '150,  0.2, None,  1.5' : 67,
 '155,  0.2, None,  1.5' : 67,
 '160,  0.2, None,  1.5' : 67,
 '165,  0.2, None,  1.5' : 67,
 '170,  0.2, None,  1.5' : 67,
 '175,  0.2, None,  1.5' : 67,
 '180,  0.2, None,  1.5' : 67,
 '185,  0.2, None,  1.5' : 67,
 '190,  0.2, None,  1.5' : 67,
 '195,  0.2, None,  1.5' : 67,
 '200,  0.2, None,  1.5' : 507,
 '205,  0.2, None,  1.5' : 515,
 '210,  0.2, None,  1.5' : 511,
 '215,  0.2, None,  1.5' : 517,
 '220,  0.2, None,  1.5' : 67,
 '225,  0.2, None,  1.5' : 67,
 '230,  0.2, None,  1.5' : 67,
 '235,  0.2, None,  1.5' : 67,
 '240,  0.2, None,  1.5' : 67,
 '245,  0.2, None,  1.5' : 67,
 '250,  0.2, None,  1.5' : 509,
 '255,  0.2, None,  1.5' : 516,
 '256,  0.2, None,  1.5' : 513}

raphaelvallat commented 2 years ago

Found the bug — I am so glad you catched this... The issue was indeed with the convolution. Using:

w = int(0.1 * sf)
idx_sum = np.convolve(idx_sum, np.ones(w), mode='same') / w

instead of

w = int(0.1 * sf)
idx_sum = np.convolve(idx_sum, np.ones(w) / w, mode='same')

solves the bug and gives similar results regardless of the sampling rate. You can see an example in the notebook in https://github.com/raphaelvallat/yasa/pull/55

I need to do more tests on this and I'll try to release a new version ASAP. My understanding is that the two attractors that you describe refers to the number of thresholds that were detected: 510 = at least one of the two methods above threshold, 67 = both methods above threshold.

raphaelvallat commented 2 years ago

I have ran more tests, and indeed the error is caused by a floating point error when using np.ones(w) / w instead of dividing by w after the convolution... such a sneaky bug! See below:

bad = np.convolve(idx_sum, np.ones(w) / w, mode='same')
good = np.convolve(idx_sum, np.ones(w), mode='same') / w
np.allclose(bad, good)  # Returns True

# Now check the number of samples above soft threshold
np.where(bad > 2)[0].size  # 19533
np.where(good > 2)[0].size  # 12528

# Another way to fix is just to add a very small number to 2
np.where(bad > 2.00000001)[0].size  # 12528

I'll work on a new release ASAP. This was the only function that used np.convolve so the bug only concerns the yasa.spindles_detection function.

raphaelvallat commented 2 years ago

New release of YASA: https://github.com/raphaelvallat/yasa/releases/tag/v0.6.1

skjerns commented 2 years ago

Perfect, thanks! Seems to be fixed now indeed :)

On a related note: Would you expect the results to change significantly when applying different HP filters? Should the spindles not be detected in the spindle range only?

I've seen that setting the HP to higher values yields far more spindle events. How should I interpret this? I do understand that in a natural setting you'll never se anything higher than 0.5Hz-HP (especially in N3, as in this example), just trying to understand the factors that affect the found spindles

Found in N3

 HP    LP    n_events
{'0.0, 100': 39,
 '0.1, 100': 39,
 '0.2, 100': 39,
 '0.3, 100': 39,
 '0.4, 100': 40,
 '0.5, 100': 39,
 '0.6, 100': 39,
 '0.7, 100': 38,
 '0.8, 100': 38,
 '0.9, 100': 39,
 '1.0, 100': 40,
 '1.1, 100': 43,
 '1.2, 100': 45,
 '1.3, 100': 50,
 '1.4, 100': 54,
 '1.5, 100': 61,
 '1.6, 100': 64,
 '1.7, 100': 67,
 '1.8, 100': 68,
 '1.9, 100': 74,
 '2.0, 100': 80,
 '2.1, 100': 95,
 '2.2, 100': 107,
 '2.3, 100': 116,
 '2.4, 100': 131,
 '2.5, 100': 148,
 '2.6, 100': 158,
 '2.7, 100': 172,
 '2.8, 100': 185,
 '2.9, 100': 203}

Found in N2

  HP   LP   n_events
{'0.0, 100': 813,
 '0.1, 100': 813,
 '0.2, 100': 814,
 '0.3, 100': 813,
 '0.4, 100': 813,
 '0.5, 100': 814,
 '0.6, 100': 812,
 '0.7, 100': 816,
 '0.8, 100': 815,
 '0.9, 100': 816,
 '1.0, 100': 824,
 '1.1, 100': 832,
 '1.2, 100': 840,
 '1.3, 100': 846,
 '1.4, 100': 853,
 '1.5, 100': 863,
 '1.6, 100': 871,
 '1.7, 100': 886,
 '1.8, 100': 892,
 '1.9, 100': 900,
 '2.0, 100': 916,
 '2.1, 100': 932,
 '2.2, 100': 950,
 '2.3, 100': 964,
 '2.4, 100': 973,
 '2.5, 100': 986,
 '2.6, 100': 1006,
 '2.7, 100': 1018,
 '2.8, 100': 1033,
 '2.9, 100': 1038}

raphaelvallat commented 2 years ago

HI @skjerns,

First, the data is filtered between the frequencies defined in freq_broad (default = 1-30 Hz):

https://github.com/raphaelvallat/yasa/blob/71c0a8245c61b328a82ab1197c34b673333120a3/yasa/detection.py#L637-L638

Then, the sigma power, relative to freq_broad is calculated and used as a detection threshold — therefore high-pass filtering may change (increase) the relative power value here.

https://github.com/raphaelvallat/yasa/blob/71c0a8245c61b328a82ab1197c34b673333120a3/yasa/detection.py#L675-L682

Next, the moving correlation threshold also uses the broadband signal. Here again, I would expect that a highpass filter > 1 Hz would increase the correlation between the sigma-filtered and the broadband signal.

https://github.com/raphaelvallat/yasa/blob/71c0a8245c61b328a82ab1197c34b673333120a3/yasa/detection.py#L693-L694

By contrast, the moving RMS threshold does not use the broadband signal, so the high-pass filter should have no impact there.

In summary, 2 / 3 thresholds rely on the broadband signal. I think that removing the low frequencies from the broadband signal may increase the relative sigma power and moving correlation, and therefore result in a more liberal detection. I expect that the highpass filter should have little or no impact on the moving RMS threshold.

Hope this makes sense!

Thanks, Raphael