lina-usc / pylossless

🧠 EEG Processing pipeline that annotates continuous data
https://pylossless.readthedocs.io/en/latest/
MIT License
20 stars 8 forks source link

BUG: Pipeline no longer flags bad epochs at all #45

Closed scott-huberty closed 1 year ago

scott-huberty commented 1 year ago

For some reason the dashboard is no longer converting mne.annotations into layout.shapes

EDIT: I shouldn't assume that the issue is with converting annotations to shapes bc I haven't looked into the issue.

3 possible scenarios.

  1. Pipeline isn't identifying bad time periods where it used to.
  2. Pipeline isn't storing bad time periods as mne.annotations like it used to.
  3. Dashboard isn't converting mne.annotations to layout.shapes like it used to.

I will update when I look into it.

This is a good example of the type of qc unit test we should do, when we figure out how to do it.

scott-huberty commented 1 year ago

Compare this:

image

scott-huberty commented 1 year ago

to this.

Screen Shot 2023-03-08 at 12 14 36 PM

christian-oreilly commented 1 year ago

Maybe instead of putting time fixing this, we could take this occasion to integrate the new mode of annotations you have been working on?

scott-huberty commented 1 year ago

Maybe instead of putting time fixing this, we could take this occasion to integrate the new mode of annotations you have been working on?

I would love to!!! though FWIW I think the issue right now is in pipeline.py.

scott-huberty commented 1 year ago

From what I can tell so far, the pipeline hasn't labeled any flagged_epochs on any files. There were previously flagged_epochs in sub-s02, and I would definitely expect there be some in sub-s01 (just plot the file to judge that for yourself)

scott-huberty commented 1 year ago

image

Further testing this, using this simulated file, I would expect the period between 2-3 seconds to be flagged.. but pipline.flag_ch_sd does not pass any flagged_epoch_inds to flagged_epochs.add_flag_cat. Further, marks_array_2_flags (when called within flag_ch_sd) does not return anything beyond the accepted threshold.

scott-huberty commented 1 year ago

here is some reproducible code:

# IMPORT DATA

from pathlib import Path

import numpy as np
import matplotlib.pyplot as plt

import mne
from mne import find_events, Epochs, compute_covariance, make_ad_hoc_cov
from mne.datasets import sample
from mne.simulation import (simulate_sparse_stc, simulate_raw,
                            add_noise, add_ecg, add_eog)

import mne_bids
import pylossless as ll
# LOAD DATA
data_path = sample.data_path()
meg_path = data_path / 'MEG' / 'sample'
raw_fname = meg_path / 'sample_audvis_raw.fif'
fwd_fname = meg_path / 'sample_audvis-meg-eeg-oct-6-fwd.fif' 

# make BIDS object
bpath = mne_bids.get_bids_path_from_fname(raw_fname, check=False)
bpath.suffix = 'sample_audvis_raw'

# Load real data as the template
raw = mne_bids.read_raw_bids(bpath)
#raw = mne.io.read_raw_fif(raw_fname)
raw.set_eeg_reference(projection=True)
# GENERATE DIPOLE TIME SERIES

n_dipoles = 4  # number of dipoles to create
epoch_duration = 2.  # duration of each epoch/event
n = 0  # harmonic number
rng = np.random.RandomState(0)  # random state (make reproducible)

def data_fun(times):
    """Generate time-staggered sinusoids at harmonics of 10Hz"""
    global n
    n_samp = len(times)
    window = np.zeros(n_samp)
    start, stop = [int(ii * float(n_samp) / (2 * n_dipoles))
                   for ii in (2 * n, 2 * n + 1)]
    window[start:stop] = 1.
    n += 1
    data = 25e-9 * np.sin(2. * np.pi * 10. * n * times)
    data *= window
    return data

times = raw.times[:int(raw.info['sfreq'] * epoch_duration)]
fwd = mne.read_forward_solution(fwd_fname)
src = fwd['src']
stc = simulate_sparse_stc(src, n_dipoles=n_dipoles, times=times,
                          data_fun=data_fun, random_state=rng)
# SIMULATE RAW DATA
raw_sim = simulate_raw(raw.info, [stc] * 10, forward=fwd, verbose=True)
raw_sim.pick_types(eeg=True)

# MAKE A VERY NOISY TIME PERIOD
raw_selection1 = raw_sim.copy().crop(tmin=0, tmax=2, include_tmax=False)
raw_selection2 = raw_sim.copy().crop(tmin=2, tmax=3, include_tmax=False)
raw_selection3 = raw_sim.copy().crop(tmin=3, tmax=19.994505956825666)

cov_noisy_period = make_ad_hoc_cov(raw_selection2.info, std=dict(eeg=.000002))
add_noise(raw_selection2, cov_noisy_period, iir_filter=[0.2, -0.2, 0.04], random_state=rng)

raw_selection1.append([raw_selection2, raw_selection3])
raw_selection1.set_annotations(None)
raw_sim = raw_selection1

# MAKE SOME VERY NOISY CHANNELS
cov = make_ad_hoc_cov(raw_sim.info)
add_noise(raw_sim, cov, iir_filter=[0.2, -0.2, 0.04], random_state=rng)

make_these_noisy = ['EEG 001','EEG 005', 'EEG 009']
cov_noisy = make_ad_hoc_cov(raw_sim.copy().pick(make_these_noisy).info, std=dict(eeg=.000002))
add_noise(raw_sim, cov_noisy, iir_filter=[0.2, -0.2, 0.04], random_state=rng)
# LOAD DEFAULT CONFIG
config = ll.config.Config()
config.load_default()
config.save("sample_audvis_config.yaml")
# GENERATE PIPELINE
pipeline = ll.LosslessPipeline('sample_audvis_config.yaml')
pipeline.raw = raw_sim

# RUN FLAG OUTLIER CHS
pipeline.flag_outlier_chs()

# RUN FLAG_CH_SD
pipeline.flag_ch_sd()

# CHECK FLAGGED
print(pipeline.flagged_chs) 
print(pipeline.flagged_epochs)
scott-huberty commented 1 year ago

I think there is a bug in marks_array2flags:

https://github.com/lina-usc/pylossless/blob/4c3ae7108da82a1985f4548394d472dbbfb7db37/pylossless/pipeline.py#L510-L541

With the simulated raw example I posted a few messages above, outlier_mask is True at epoch 2 (where I created the noise) for all channels... Which makes sense....

but then at line 519 we are averaging the boolean values in outlier_mask across the epochs dimension...

that just gives us an array the size of n_channels. We are then comparing that array against the rowthresh (which in our current case is .2).

That really doesn't make sense to me. If we want to know which epochs have more than 20% of channels that are "unlike themselves" (as James would say), why is the value we are testing, the average across epochs?

christian-oreilly commented 1 year ago

@scott-huberty Did you checked your expectation against the representation in the corresponding steps in https://pylossless.readthedocs.io/en/latest/implementation.html? Note that pipline.flag_ch_sd performs both 3 and 4. The dimensions of the returned array should is very clear from these png.

Note that

but then at line 519 we are averaging the boolean values in outlier_mask across the epochs dimension...

is somewhat of a misleading statement. The dimension operater on at line 519 is variable; it depends on both the dimensions present in inarray and the value of flag_dim passed to get_operate_dim(...). The value of flag_dim is different depending on whether you call are at step 3 or 4 (both called by flag_ch_sd).

It's not misleading given the example I was giving (flagging epochs). But I admit i had "post looked at marks array/flags brain and probably sounded crazy.

It's true that function is used for both epochs and channels (that's a separate issue!).

If we want to flag bad epochs then we should not average across epochs (that's consistent with the docs).

I stand by my code, but I am also prepared to eat crow if I'm wrong 🤠

Let's review in our dev meeting tomorrow!

Andesha commented 1 year ago

I think I saw elsewhere that this has since been fixed?

scott-huberty commented 1 year ago

It is fixed in #53 (EDIT: updated PR reference).

@christian-oreilly Unless you want to review and need more time, I am going to merge the 3 open PR's.