sappelhoff / pyprep

PyPREP: A Python implementation of the Preprocessing Pipeline (PREP) for EEG data
https://pyprep.readthedocs.io/en/latest/
MIT License
140 stars 34 forks source link

Variability in interpolated channels between RANSAC runs? #41

Closed a-hurst closed 3 years ago

a-hurst commented 4 years ago

I've been using PyPREP to set up an EEG preprocessing pipeline for a project I'm working on (~40 minute sessions, 1000Hz, 32 channels), and for most of it I've been using a fixed random seed. However, I got curious about the variability in RANSAC between runs with random seeds, so I went through my full dataset a few times with a script, writing out the names and numbers of initial bad, interpolated, and remaining bad channels per session to a .csv. On some files, I saw differences as big as 4 channels between two consecutive runs. Overall, there was at least one difference in interpolated channels on a majority of the 85 files.

Given that RANSAC is based on thresholds and random sample consensus, I'm not necessarily surprised by this, but I was wondering if there were any parameters I could adjust to make RANSAC more stable between runs (presumably at the cost of speed/RAM)?

sappelhoff commented 4 years ago

Hi @a-hurst that sounds like too much variation to me, are you sure that it's not just due to that particular dataset?

If it's also the case for other datasets, do you find the same variability in the original PREP (matlab/EEGLAB)?

Finally, there is another RANSAC implementation in https://github.com/autoreject/autoreject --> perhaps you could run all steps of PYPREP except RANSAC, then run the autoreject RANSAC and append the results to pyprep, and then finish running pyprep, pretending the autoreject ransac was the pyprep ransac ... :slightly_smiling_face: sounds like overkill, but I am trying to find a potential bug.

a-hurst commented 4 years ago

It very well could be my dataset being wonky, but then again I guess there wouldn’t be much need for RANSAC in a world without wonky datasets. Is there an open EEG dataset you trust to be relatively normal that you’d like me to run the same tests on? My script would be easily adapted to any BIDS dataset.

I have no serious MATLAB experience so I’m probably not the best person to check the results against that, but I can certainly try subbing in autoreject’s implementation in my script! I’m calling all the PREP functions separately anyway instead of using the Pipeline object, so it wouldn’t be much effort.

yjmantilla commented 4 years ago

I can confirm having seen that issue in a local dataset. I do know the RANSAC of matlab has a fixed seed but I remember trying to get rid of that seed and it still gave more consistent results than the pyprep implementation. Nevertheless I did those tests a long time ago so I don't quite remember all the details so take that with a pinch of salt.

Im really curious of knowing if autoreject's implementation shows the same variability because if it does what really mesmerizes me is what the matlab's ransac does to be so consistent.

This was also a bit discussed in #21 (not in the main big post but on the following comments).

a-hurst commented 4 years ago

@sappelhoff Okay, I found an open EEG dataset I thought would be a good test candidate, and hooooo boy there is quite a bit of variation in rejected channels between runs. I'm not sure if some of this is the lower sample rate (256 Hz) or higher channel count (64ch) vs my own dataset (32ch @ 1000 Hz), but PyPREP really has issues with it.

I've only run it twice so far, and only on the first session for the first 10 ids (running it locally so CPU/RAM consumption is an issue), but I've attached the output files from the two runs as well as a CSV detailing the differences between the two runs. Note that this latter file is counting and reporting all channels that were different between the two runs, not the differences in numbers of channels flagged per run: ransac_compare.zip

As a quick summary of the data:

The data doesn't even look particularly bad: apart from a few seconds of major noise around 250 seconds in, the data for sub-007 look extremely clean and the only channel flagged as bad by RANSAC on either run was A30. After robust referencing, however, that somehow balloons to 34 (!!!) interpolated channels by the end of one of my runs.

Here's my code for running the pipeline and logging the results:

### Import required libraries #################################################

import os
import csv
import shutil
import binascii

import mne
import numpy as np

from mne_bids import BIDSPath, read_raw_bids
from pyprep.prep_pipeline import PrepPipeline
from pyprep.find_noisy_channels import NoisyChannels
from pyprep.removeTrend import removeTrend
from pyprep.reference import Reference

### Set general parameters ####################################################

mne.set_log_level("WARNING")
seed = None

session = '01'
task = 'meditation'
datatype = 'eeg'
bids_root = 'ds001787'
info_file = 'prep_info.csv'

ch_non_eeg = [
    'EXG1', 'EXG2', 'EXG3', 'EXG4', 'EXG5', 'EXG6', 'EXG7', 'EXG8',
    'GSR1', 'GSR2',
    'Erg1', 'Erg2',
    'Resp', 'Plet', 'Temp'
]
ch_type_overrides = {}
for ch in ch_non_eeg:
    ch_type_overrides[ch] = 'misc'

def preprocess_eeg(id_num, random_seed=None):

    # Set important variables
    bids_path = BIDSPath(id_num, task=task, datatype=datatype, root=bids_root,
                         session=session)
    if not random_seed:
        random_seed = int(binascii.b2a_hex(os.urandom(4)), 16)
    id_info = {"id": id_num, "random_seed": random_seed}

    ### Load and prepare EEG data #############################################

    header = "### Processing sub-{0} (seed: {1}) ###".format(id_num, random_seed)
    print("\n" + "#" * len(header))
    print(header)
    print("#" * len(header) + "\n")

    # Load EEG data
    raw = read_raw_bids(bids_path, verbose=True)
    raw.set_channel_types(ch_type_overrides)

    # Add a montage to the data
    montage_kind = "biosemi128"
    montage = mne.channels.make_standard_montage(montage_kind)
    mne.datasets.eegbci.standardize(raw)
    raw.set_montage(montage)

    # Extract some info
    eeg_index = mne.pick_types(raw.info, eeg=True, eog=False, meg=False)
    ch_names = raw.info["ch_names"]
    ch_names_eeg = list(np.asarray(ch_names)[eeg_index])
    sample_rate = raw.info["sfreq"]

    # Make a copy of the data
    raw_copy = raw.copy()
    raw_copy.load_data()

    # Prepare copy of raw data for PREP
    raw_copy.pick_types(eeg=True)

    ### Run components of PREP manually #######################################

    print("\n\n=== Performing CleanLine... ===")

    # Try to remove line noise using CleanLine approach
    linenoise = np.arange(60, sample_rate / 2, 60)
    EEG_raw = raw_copy.get_data() * 1e6
    EEG_new = removeTrend(EEG_raw, sample_rate=sample_rate)
    EEG_clean = mne.filter.notch_filter(
        EEG_new,
        Fs=sample_rate,
        freqs=linenoise,
        filter_length="10s",
        method="spectrum_fit",
        mt_bandwidth=2,
        p_value=0.01,
    )
    EEG_final = EEG_raw - EEG_new + EEG_clean
    raw_copy._data = EEG_final * 1e-6
    del linenoise, EEG_raw, EEG_new, EEG_clean, EEG_final

    # Perform robust re-referencing
    prep_params = {
        "ref_chs": ch_names_eeg,
        "reref_chs": ch_names_eeg
    }
    reference = Reference(
        raw_copy,
        prep_params,
        ransac=True,
        random_state=seed
    )
    print("\n\n=== Performing Robust Re-referencing... ===\n")
    reference.perform_reference()

    # Get info
    initial_bad = reference.noisy_channels_original["bad_all"]
    id_info['initial_bad'] = " ".join(initial_bad)
    id_info['num_initial_bad'] = len(initial_bad)

    interpolated = reference.interpolated_channels
    id_info['interpolated'] = " ".join(interpolated)
    id_info['num_interpolated'] = len(interpolated)

    remaining_bad = reference.still_noisy_channels
    id_info['remaining_bad'] = " ".join(remaining_bad)
    id_info['num_remaining_bad'] = len(remaining_bad)

    # Print re-referencing info
    print("\nRe-Referencing Info:")
    print(" - Bad channels original: {0}".format(initial_bad))
    print(" - Bad channels after re-referencing: {0}".format(interpolated))
    print(" - Bad channels after interpolation: {0}".format(remaining_bad))

    print("\n\n### sub-{0} complete! ###\n\n".format(id_num))

    return id_info

### Run loop for all IDs ######################################################

ids = []
for sub in os.listdir(bids_root):
    if "sub-" in sub:
        ids.append(sub.split("-")[1])

processed_ids = []
if not os.path.isfile(info_file):
    open(info_file, 'w').close()
elif os.path.getsize(info_file) > 0:
    with open(info_file, 'r') as f:
        info_csv = csv.DictReader(f)
        processed_ids = [row['id'] for row in info_csv]                      

first_id = len(processed_ids) == 0
with open(info_file, 'a', newline='') as outfile:
    writer = csv.writer(outfile)
    print("")
    for sub in ids[:10]:
        if sub in processed_ids:
            print(" - sub-{0} already processed, skipping...\n".format(sub))
            continue
        info = preprocess_eeg(sub, random_seed=seed)
        if first_id:
            writer.writerow(list(info.keys()))
            first_id = False
        writer.writerow(list(info.values()))

And here's my code for generating the comparison file. It's not very flexible in that it only supports comparing two runs, but it's at least something:

hdr = [
    'id', 'ndiff_init_bad', 'ndiff_interpolated', 'ndiff_still_bad',
    'diff_init_bad', 'diff_interpolated', 'diff_still_bad'
]
with open('prep_info.csv', 'r') as f1:
    with open('prep_info1.csv', 'r') as f2:
        with open('prep_diffs.csv', 'w') as out:
            info_csv1 = csv.DictReader(f1)
            info_csv2 = csv.DictReader(f2)
            out_csv = csv.writer(out)
            out_csv.writerow(hdr)
            info1 = [row for row in info_csv1]
            info2 = [row for row in info_csv2]
            for i in range(len(info1)):
                id_num = info1[i]['id']
                init_bad1 = set(info1[i]['initial_bad'].split(' '))
                init_bad2 = set(info2[i]['initial_bad'].split(' '))
                init_diffs = list(init_bad1 - init_bad2) + list(init_bad2 - init_bad1)
                interp1 = set(info1[i]['interpolated'].split(' '))
                interp2 = set(info2[i]['interpolated'].split(' '))
                interp_diffs = list(interp1 - interp2) + list(interp2 - interp1)
                remain1 = set(info1[i]['remaining_bad'].split(' '))
                remain2 = set(info2[i]['remaining_bad'].split(' '))
                remain_diffs = list(remain1 - remain2) + list(remain2 - remain1)
                out_csv.writerow([
                    id_num, len(init_diffs), len(interp_diffs), len(remain_diffs),
                    " ".join(init_diffs), " ".join(interp_diffs), " ".join(remain_diffs)
                ])
sappelhoff commented 4 years ago

Hey @a-hurst thanks a lot for your tests. I have taken a brief look at the code and was wondering why you are using the PREP components individually instead of PrePPipeline.fit()?

In any case, this warrants a serious investigation of the PyPrep codebase ... unfortunately I don't have the time for that at all, and don't expect to have time to do that myself either :-(

I'd very much welcome contributions though and I would be happy to either:

  1. transfer this repo to a separate GitHub organization, where it can be maintained by several people
  2. give you (and @yjmantilla) maintainer rights for this repo (hosted on my personal GH account)

In the meantime, I hope that the current warning in the README suffices:

ALPHA SOFTWARE. This package is currently in its early stages of iteration. It may change both its internals or its user-facing API in the near future. Any feedback and ideas on how to improve either of these is more than welcome! Use this software at your own risk.

a-hurst commented 4 years ago

@sappelhoff I think the first thing to do would be to run a couple people from this dataset through the MATLAB PREP pipeline two or three times and see if we get similar variability. Given that we already have pretty good comparison tests between PyPREP and MATLAB PREP, and that I'm pretty confident in the quality of your code given your careful consideration of my recent PR in mne-bids, I have a hunch MATLAB PREP will have similar problems.

I have basically zero MATLAB/Octave experience, though, so would you or @yjmantilla be willing to let sub-007, ses-01 run twice in the background with MATLAB PREP and see what the consistency between runs is like? If it'd be reasonably straightforward for a beginner I can give it a shot too, I just don't want to spend a few hours struggling through what should be a 5-minute process. Once we know whether it's a Python implementation issue or an original implementation issue we can sort how to investigate: I don't have a ton of free time either, but I can try to set some aside for this since I'm sure this software will come in handy more than once during my PhD.

As for the weirdness of my script: I'm using the individual functions instead of the normal API because I adapted this from a larger and more involved pipeline script, where I save PSD and channel plots after each step of the pipeline so I can do a visual sanity check of the cleaning process for each file.

sappelhoff commented 4 years ago

I'm pretty confident in the quality of your code given your careful consideration of my recent PR in mne-bids

that's very nice of you to say, but to provide a little historical perspective: I initially wrote the code for pyprep's "find noisy channels" (up to version 0.2.3 of this package), and everything that's on top here (which is a lot!) was written as part of a student project supervised by @adam2392 (please correct me if I am wrong Adam!), and I am not 100% confident with all of the aspects. Meanwhile, @yjmantilla and @christian-oreilly have patched a few things -- but pyprep is still not in a "good shape" I think.

Regarding your proposal to test matlab PREP and pyprep in comparison --> that's a good idea. I hope @yjmantilla can find some time.

I don't have a ton of free time either, but I can try to set some aside for this since I'm sure this software will come in handy more than once during my PhD.

yep

As for the weirdness of my script: I'm using the individual functions instead of the normal API because I adapted this from a larger and more involved pipeline script, where I save PSD and channel plots after each step of the pipeline so I can do a visual sanity check of the cleaning process for each file.

Ah, nice - of course that'd be a nice feature to have in pyprep as well :) ... but first things first.

a-hurst commented 4 years ago

Regarding your proposal to test matlab PREP and pyprep in comparison --> that's a good idea. I hope @yjmantilla can find some time.

That would be great! I downloaded Octave myself just to get a sense of the difficulty and can't even figure out how to import the .bdf files. Coming from R and Python, MATLAB's package management system and weird GUI/CLI hybrid leave a lot to be desired!

a-hurst commented 4 years ago

Also figured I should tag @christian-oreilly since they did some previous tests for PyPREP comparing cleanline between PyPrep and MATLAB PREP: I'm sure you're as busy as the rest of us, but if @yjmantilla doesn't have time would you be willing to run a subject or two from the above dataset through MATLAB PREP to see whether you get similar variability between runs?

yjmantilla commented 4 years ago

Ok, Im running some tests for [sub-007, ses-01]

My output will be csvs with the following info but across runs:

All channel indexes starting @ 1

Hopefully this will be informative enough to pinpoint where is the problem in pyprep.

Now, currently pyprep doesn't output the bad_channels along each of the criteria, I think it would be nice to provide that functionality to compare the values directly . See #20

I would be willing to test more subjects but for now this test will be clarifying enough.

P.S: I hate matlab too.

Example Code (i think i initialized everything just like @a-hurst but if anyone spots anything, please do tell me):

%% PREP TEST

%% Libraries
clc, clear all ,close all;
dir = pwd;

eeglab_path = 'Y:\code\gica_paper\resources\eeglab2020_0';
cfg.path.resources = 'Y:\code\gica_paper\resources\';
rmpath(genpath(eeglab_path));
%eeglab_path = 'Y:/code/preprocessing-flow/resources/eeglab2019_1_prep';
prep_path = 'Y:\code\gica_paper\resources\PrepPipeline';
rmpath(genpath(prep_path));
cd(eeglab_path)
eeglab
%addpath(genpath(eeglab_path));
addpath(genpath(prep_path));
cd(dir)
close all
%% Preprocessing with PREP 

% Routes 
% channels location
file_url = 'Y:\datasets\New folder\sub-007_ses-01_eeg_sub-007_ses-01_task-meditation_eeg.bdf';
EEG = pop_biosig(file_url);
EEG = eeg_checkset(EEG);

chans = {'Fp1' 'AF7' 'AF3' 'F1' 'F3' 'F5' 'F7' 'FT7' 'FC5' 'FC3' 'FC1' 'C1' 'C3' 'C5' 'T7' 'TP7' 'CP5' 'CP3' 'CP1' 'P1' 'P3' 'P5' 'P7' 'P9' 'PO7' 'PO3' 'O1' 'Iz' 'Oz' 'POz' 'Pz' 'CPz' 'Fpz' 'Fp2' 'AF8' 'AF4' 'AFz' 'Fz' 'F2' 'F4' 'F6' 'F8' 'FT8' 'FC6' 'FC4' 'FC2' 'FCz' 'Cz' 'C2' 'C4' 'C6' 'T8' 'TP8' 'CP6' 'CP4' 'CP2' 'P2' 'P4' 'P6' 'P8' 'P10' 'PO8' 'PO4' 'O2' 'EXG1' 'EXG2' 'EXG3' 'EXG4' 'EXG5' 'EXG6' 'EXG7' 'EXG8' 'GSR1' 'GSR2' 'Erg1' 'Erg2' 'Resp' 'Plet' 'Temp'};

for i= 1:EEG.nbchan
    EEG.chanlocs(i).labels = chans{i};
end

%EEG = eeg_checkset(EEG);

noChan = {'EXG1' 'EXG2' 'EXG3' 'EXG4' 'EXG5' 'EXG6' 'EXG7' 'EXG8' 'GSR1' 'GSR2' 'Erg1' 'Erg2' 'Resp' 'Plet' 'Temp'};

EEG = pop_select(EEG,'nochannel',noChan);%{'NAS' 'LVEOG' 'RVEOG' 'LHEOG' 'RHEOG' 'NFPZ' 'VEOG' 'HEOG'});   
EEG = eeg_checkset(EEG);

EEG = pop_chanedit(EEG, 'lookup',fullfile(cfg.path.resources,'standard-10-5-cap385.elp'));%maybe try this
EEG = eeg_checkset(EEG);

nb_channels = EEG.nbchan;

% PREP approach
params = struct();
params.referenceChannels = 1:nb_channels;
params.evaluationChannels = 1:nb_channels;
params.rereferencedChannels = 1:nb_channels;
params.detrendChannels = 1:nb_channels;
params.lineNoiseChannels = 1:nb_channels;
params.lineFrequencies = 60:60:EEG.srate/2;
params.detrendType = 'high pass';
params.detrendCutoff = 1;
params.referenceType = 'robust';
params.keepFiltered = false;

bad_types = {'badChannelsFromDropOuts' 'all' 'badChannelsFromNaNs' 'badChannelsFromNoData' 'badChannelsFromLowSNR' 'badChannelsFromHFNoise' 'badChannelsFromCorrelation' 'badChannelsFromDeviation' 'badChannelsFromRansac'};
stats_types = {'noisyStatisticsOriginal' 'noisyStatisticsBeforeInterpolation' 'noisyStatistics'};
for i=1:10
    [EEG2, computationTimes] = prepPipeline(EEG, params);
    EEG2.setname = [EEG.setname '_' num2str(i) '_PREP'];
    %pop_saveset( EEG, 'filepath',fullfile(cfg.path.output,groups{g}), 'filename',EEG.setname);
    data = EEG2.etc;

    save(strcat(EEG2.setname,'_etc'),'data');

    for j = 1:length(stats_types)
        for k = 1:length(bad_types)
            dummy = eval(['EEG2.etc.noiseDetection.reference.' stats_types{j} '.noisyChannels.' bad_types{k}]);
            dlmwrite([EEG.setname '_MATLAB_' stats_types{j} '_' bad_types{k} '.csv'],dummy,'-append')
        end
    end
    dlmwrite([EEG.setname '_MATLAB_interpolated.csv'],EEG2.etc.noiseDetection.interpolatedChannelNumbers,'-append')
    dlmwrite([EEG.setname '_MATLAB_stillNoisy.csv'],EEG2.etc.noiseDetection.stillNoisyChannelNumbers,'-append')

end
a-hurst commented 4 years ago

@yjmantilla Thanks so much, looking forward to the results of this! The data from sub-007 should tell us a lot, but in case we want to try any other files the two most variable IDs were sub-003 and sub-005 (I mainly suggested 007 for their massive and surprising jump in channels marked as bad, which happened on both my runs).

EDIT: Regarding differences between your pipeline and mine, the one thing that came to mind was that @christian-oreilly found that the MATLAB cleanline didn’t seem to do anything at all for some of the files they tested. Can you check that it’s working properly for this file in case that’s a confound?

yjmantilla commented 4 years ago

Ok so here are the results:

https://drive.google.com/drive/folders/10ZRnaUBs6VF8n1obVBp4TeXInK6_OZ1n?usp=sharing

In the folder you will find the following files:

image

Basically I just changed the following line in findNoisyChannels.m (matlab prep):

 %stream = RandStream('mt19937ar', 'Seed', 435656);
 stream = RandStream('mt19937ar','Seed','shuffle');

This no seed test was done because as far as I know in pyprep we are not giving a particular seed to the random number generator (except for pytest tests)

So as to the other patterns in the name: summary --> interpolated and still noisy channels noisyStatisticsOriginal-->bad channels identified before robust referencing noisyStatisticsBeforeInterpolation-->bad channels just before interpolation (after robust referencing, determines interpolated) noisyStatistics--> bad channels after interpolation (determines still noisy channels)

Some early conclusions:

PS:

The code to generate these csvs files starting from the matlab code i posted above is this one, in case any soul out there tries to replicate:

import os
import pandas as pd

directory = r'Y:\code\gica_paper\prep_validation\sub-007_ses-01_noSeed'
#directory = r'Y:\code\gica_paper\prep_validation\sub-007_ses-01_prep'

sub = os.path.basename(os.path.normpath(directory))

def filter_files(filelist,formats = ['.edf','.set','.cnt','.bdf','.mat','.vhdr']):
    filtered = []
    for x in filelist:
        valid_file = []
        for format in formats:
            valid_file.append(format in x)
        if any(valid_file):
            filtered.append(x)
    return filtered

files = [os.path.join(r,file) for r,d,f in os.walk(directory) for file in f]
files = filter_files(files,['.csv'])
# print(files)

noisyStatisticsBeforeInterpolation = []
noisyStatistics = []
noisyStatisticsOriginal = []
summary = []
for f in files:
    if 'noisyStatisticsBeforeInterpolation' in f:
        noisyStatisticsBeforeInterpolation.append(f)
    elif 'noisyStatisticsOriginal' in f:
        noisyStatisticsOriginal.append(f)
    elif 'noisyStatistics' in f:
        noisyStatistics.append(f)
    elif 'interpolated' in f:
        summary.append(f)
    elif 'stillNoisy' in f:
        summary.append(f)
    else:
        pass

def findChar(c,string):
    idx = [x for x, v in enumerate(string) if v == c]
    return idx

# for x in [noisyStatisticsBeforeInterpolation , noisyStatistics , noisyStatisticsOriginal , interpolated , stillNoisy]:
#     print(x)

def get_field(file):
    idxs = findChar('_',file)
    idxs.sort()
    return file[idxs[-1]+1:].replace('.csv','')

import copy
import numpy as np

for current in ['noisyStatisticsBeforeInterpolation','noisyStatistics','noisyStatisticsOriginal','summary']:
    fields = []
    data = []

    for file in eval(current):
        f = open(file, 'r') 
        Lines = f.readlines()
        Lines = [x.replace('\n','').replace(',',';') for x in Lines]
        #print(Lines)
        field = get_field(file)
        fields.append(field)
        data.append(copy.deepcopy(Lines))

    data = np.array(data)
    print('iterations',data.shape[1])

    for i in range(data.shape[1]):
        print(data[:,i])

    print(data.T)
    df = pd.DataFrame(data=data.T,columns=fields)
    print(df)
    df.to_csv(sub+'_'+current+'.csv',sep=',')
    df.to_excel(sub+'_'+current+'.xlsx')
yjmantilla commented 4 years ago

@a-hurst just saw your zip results. That many channels interpolated is kind of weird. Makes me remember how pyprep was acting before fixing a detrending bug.

I guess I should run this subject with pyprep too to check if I get the same degree of interpolated channels but that will be for another day.

BTW, another confounding variable is the montage. I used the channel names in task-meditation_channels.tsv . I see you seem to use the original ones in the bdf file and that mne managed to find the locations in their montage files. I was not so lucky in matlab so I had to find some more standard names (FPz ,and such), my luck came back since the tsv had that notation.

I guess there must be some montage differences between the default montages of mne and eeglab but I doubt thats the source of our problems.

a-hurst commented 4 years ago

@yjmantilla Fantastic, will look at this later today! For whatever reason, MNE-BIDS doesn't use the channel names in the _channels.tsv file so I had to go with the biosemi montage. All the MNE montages are available as individual text files in the source code, would it be possible to import one of them into EEGLAB?

yjmantilla commented 4 years ago

@yjmantilla Fantastic, will look at this later today! For whatever reason, MNE-BIDS doesn't use the channel names in the _channels.tsv file so I had to go with the biosemi montage. All the MNE montages are available as individual text files in the source code, would it be possible to import one of them into EEGLAB?

Surely it is possible but with a much needed conversion. I read the biosemi128.txt in EEGLAB (with some manipulation) but wasn't able to derive something trivial enough. image

Another alternative is to read the elp in mne since it supports it. EDIT: Seems the elp file used is not an standard elp file after all. I wonder why this stuff is like this :/

a-hurst commented 4 years ago

@yjmantilla So I started trying to debug this issue and noticed I was still getting some variation between runs, even with a fixed random seed!

Turns out the issue is with the chunk size for RANSAC: if variation in detected free RAM between runs causes the number of chunks to differ, that causes the number of calls to NoisyChannels.get_ransac_pred() per iteration to change, which means a different number of calls to the RNG function that selects channels for reconstruction each call, throwing everything off.

I don't think it's the root of the main issue here but it's still definitely a problem. I'm guessing the fix is to allow (require?) users to specify a chunk size instead of deciding that dynamically? I'm sure MATLAB prep has had to address the same problem, so they might have a different solution, but that's all I can think of at the moment.

EDIT: Also, extremely odd that the spatial montage data differs so much between EEGLAB and MNE.

yjmantilla commented 4 years ago

I see, I will check that out. The chunk was implemented (me to blame) since the ransac consumed too much ram.

Do notice though that matlab sets the same seed, no matter the run. If one removes that seed there is indeed variability between runs in matlab too. So maybe we should set a general seed too, although that is too arbitrary for my taste...

What really scares me is why this particular EEG gives so many interpolated channels , in matlab there is not that many. I have checked another EEG of matlab vs pyprep and the difference was only one channel.

The spatial difference between matlab and mne (regarding biosemi128.txt) is because i have not been able to find the correct conversion yet .

I will try to run the EEG with the original ransac version of @sappelhoff , Im not sure my ram will be able to manage it but lets see.

yjmantilla commented 4 years ago

Ok I'm looking at the chunk issue closer. Indeed it is a problem. My proposed fix is to move the following code of get_ransac_pred()

        rng = check_random_state(self.random_state)

        # Pick a subset of clean channels for reconstruction
        reconstr_idx =
         rng.choice(
             np.arange(chn_pos_good.shape[0]), size=n_pred_chns, replace=False
         )

To outside of the function, specifically to find_bad_by_ransac(), just after setting n_preds_chns, like this:

        # Check if we have enough remaining channels
        # after exclusion of bad channels
        n_pred_chns = int(np.ceil(fraction_good * n_chans_good))

       ---------- add this -----------------------------
        # Pick a subset of clean channels for reconstruction
        rng = check_random_state(self.random_state)

        reconstr_idx = rng.choice(
            np.arange(chn_pos_good.shape[0]), size=n_pred_chns, replace=False
        )

And inputting the selected channels for the prediction to the necessary functions. That way, every calculated chunk will use the same subset of channels. And it should give no variability across runs given a seed.

What do you think @a-hurst . Now this was indeed a bad bug but I dont think it will remove the variability.

As I said :

Here is the variability of matlab when no seed is used (when there is a seed there is no variability):

image

yjmantilla commented 4 years ago

Just checked the proposed fix against a local EEG and it gave no variability given a seed as the following image show:

image

Guess that would be a PR.

EDIT: Checking the matlab code, it seems each sample needs an specific (and different) subset of channels. But this can be implemented too. Basically fabricate fixed subsets as there are samples at the start of ransac.

a-hurst commented 4 years ago

Agreed that the seedless interpolation variability and average interpolation counts are definitely something else, I was just too lazy to open up a new issue for the new bug I found.

As per your fix, I think you've got the right idea with moving the random channel selection out of the loop and pre-generating: if we generate a list of n_samples worth of random channel selections at the top of the find_bad_by_ransac and reuse the same channel selections per sample across chunks in get_ransac_pred, that should result in consistent RANSAC results across iterations and chunk sizes while preserving the randomness and variability needed.

Also, I just noticed something that's probably just my lack of understanding, but I wanted to double-check:

https://github.com/sappelhoff/pyprep/blob/afb5167457f7d48a437b4da568e80bf1be1da0ad/pyprep/find_noisy_channels.py#L598-L601

From my rough understanding, this bit is supposed to randomly choose some channels that haven't already been flagged as 'bad', the data from which are then used to predict what the signal(s) for the channels in the current chunk should look like, prior to the correlation testing between the interpolated vs actual signals. However, the code here, specifically the arange bit, looks like it's just creating a list of consecutive numbers from 0 to "number of currently-good channels" and sampling randomly from those, meaning that the algorithm isn't actually excluding bad channels while making its predictions (and potentially excluding some good ones). Is that accurate, or am I misreading the code here? If that's the case, it might explain some of the interpolation weirdness we're seeing.

sappelhoff commented 4 years ago

Nice, it seems like one issue has already been found ... goes without saying that a PR is very welcome!


Is that accurate, or am I misreading the code here

I think you may be misreading the code:

https://github.com/sappelhoff/pyprep/blob/afb5167457f7d48a437b4da568e80bf1be1da0ad/pyprep/find_noisy_channels.py#L598-L611

the deciding point is that in good_chn_labs there are the channel labels of the good channels ... and we sample randomly from those labels using the generated indices.

Then to get the actual channels out of all channels, we use this list comprehension:

 reconstr_picks = [ 
     list(self.ch_names_new).index(chn_lab) for chn_lab in reconstr_labels 
 ] 
sappelhoff commented 3 years ago

One aspect of this issue is now fixed thanks to @a-hurst's #43 thanks also @yjmantilla for joining in the investigation!

a-hurst commented 3 years ago

Okay, so based on @yjmantilla's issues with the BioSemi montage in EEGLAB, I decided to look up the original paper the open dataset was from and found that they actually did use a standard 10-20 system for their electrodes, and presumably it was just that their EEG recording software wasn't configured properly and used the A1-A32 + B1-B32 channel labels in the EDF instead. I modified my testing script to use the 10-20 channel names specified in the file's channels.tsv and re-ran it to check the variability between runs with a random seed. With the 10-20 montage and renamed channels there's still variability, but the results are a lot better:

Data summary, on the same 10 participants as in my initial comparison post:

To save you some scrolling, this is what the same values were last time:

This is definitely a lot less concerning, but I think still worth looking into. Notably, sub-007 still has the same dramatic jump in bad channels as it did last time (albeit slightly smaller, but still > 20 on both runs), which is weird and should be investigated further.

I'm going to do some more digging on my end here, but I guess the next logical step would be a 1-to-1 comparison between MATLAB and PyPREP like we'd discussed to narrow down whether it's a general PREP problem to address or just an implementation problem.

Like last time, I've attached the raw .csv data from my runs, along with a .csv showing the differences between the runs: prep_variability_redux.zip. I've also included my revised testing script below:

The updated test script (click to unhide) ```python ### Import required libraries ################################################# import os import csv import shutil import random import binascii import mne import numpy as np from mne_bids import BIDSPath, read_raw_bids from pyprep.prep_pipeline import PrepPipeline from pyprep.find_noisy_channels import NoisyChannels from pyprep.removeTrend import removeTrend from pyprep.reference import Reference ### Set general parameters #################################################### mne.set_log_level("WARNING") seed = None runs_per_id = 1 session = '01' task = 'meditation' datatype = 'eeg' bids_root = 'ds001787' chan_file = os.path.join(bids_root, 'task-meditation_channels.tsv') info_file = 'prep_info.csv' ch_labels = [] ch_type_overrides = {} with open(chan_file, 'r') as f: ch_info = csv.DictReader(f, delimiter='\t') for row in ch_info: ch_labels.append(row['name']) if row['type'] != "EEG": ch_type_overrides[row['name']] = 'misc' def preprocess_eeg(id_num, random_seed=None, run=None): # Set important variables bids_path = BIDSPath(id_num, task=task, datatype=datatype, root=bids_root, session=session) if not random_seed: random_seed = int(binascii.b2a_hex(os.urandom(4)), 16) random.seed(random_seed) id_info = {'id': id_num, 'run': run, 'random_seed': random_seed} ### Load and prepare EEG data ############################################# header = "### Processing sub-{0} (seed: {1}) ###".format(id_num, random_seed) print("\n" + "#" * len(header)) print(header) print("#" * len(header) + "\n") # Load EEG data raw = read_raw_bids(bids_path, verbose=True) # Fix channel names before montage (only needed for this dataset) ch_name_map = {} for i in range(len(ch_labels)): old_name = raw.info['ch_names'][i] ch_name_map[old_name] = ch_labels[i] mne.rename_channels(raw.info, ch_name_map) raw.set_channel_types(ch_type_overrides) # Add a montage to the data montage_kind = "standard_1020" montage = mne.channels.make_standard_montage(montage_kind) mne.datasets.eegbci.standardize(raw) raw.set_montage(montage) # Extract some info eeg_index = mne.pick_types(raw.info, eeg=True, eog=False, meg=False) ch_names = raw.info["ch_names"] ch_names_eeg = list(np.asarray(ch_names)[eeg_index]) sample_rate = raw.info["sfreq"] # Make a copy of the data raw_copy = raw.copy() raw_copy.load_data() # Prepare copy of raw data for PREP raw_copy.pick_types(eeg=True) ### Run components of PREP manually ####################################### print("\n\n=== Performing CleanLine... ===") # Try to remove line noise using CleanLine approach linenoise = np.arange(60, sample_rate / 2, 60) EEG_raw = raw_copy.get_data() * 1e6 EEG_new = removeTrend(EEG_raw, sample_rate=sample_rate) EEG_clean = mne.filter.notch_filter( EEG_new, Fs=sample_rate, freqs=linenoise, filter_length="10s", method="spectrum_fit", mt_bandwidth=2, p_value=0.01, ) EEG_final = EEG_raw - EEG_new + EEG_clean raw_copy._data = EEG_final * 1e-6 del linenoise, EEG_raw, EEG_new, EEG_clean, EEG_final # Perform robust re-referencing prep_params = { "ref_chs": ch_names_eeg, "reref_chs": ch_names_eeg } reference = Reference( raw_copy, prep_params, ransac=True, random_state=random_seed ) print("\n\n=== Performing Robust Re-referencing... ===\n") reference.perform_reference() # Get info initial_bad = reference.noisy_channels_original["bad_all"] initial_bad.sort() id_info['initial_bad'] = " ".join(initial_bad) id_info['num_initial_bad'] = len(initial_bad) interpolated = reference.interpolated_channels interpolated.sort() id_info['interpolated'] = " ".join(interpolated) id_info['num_interpolated'] = len(interpolated) remaining_bad = reference.still_noisy_channels remaining_bad.sort() id_info['remaining_bad'] = " ".join(remaining_bad) id_info['num_remaining_bad'] = len(remaining_bad) # Print re-referencing info print("\nRe-Referencing Info:") print(" - Bad channels original: {0}".format(initial_bad)) print(" - Bad channels after re-referencing: {0}".format(interpolated)) print(" - Bad channels after interpolation: {0}".format(remaining_bad)) print("\n\n### sub-{0} complete! ###\n\n".format(id_num)) return id_info ### Run loop for all IDs ###################################################### ids = [] for sub in os.listdir(bids_root): if "sub-" in sub: ids.append(sub.split("-")[1]) processed_ids = [] if not os.path.isfile(info_file): open(info_file, 'w').close() elif os.path.getsize(info_file) > 0: with open(info_file, 'r') as f: info_csv = csv.DictReader(f) processed_ids = [row['id'] for row in info_csv] first_id = len(processed_ids) == 0 with open(info_file, 'a', newline='') as outfile: writer = csv.writer(outfile) print("") for sub in ids[:10]: if sub in processed_ids: print(" - sub-{0} already processed, skipping...\n".format(sub)) continue for run in range(runs_per_id): info = preprocess_eeg(sub, random_seed=seed, run=run+1) if first_id: writer.writerow(list(info.keys())) first_id = False writer.writerow(list(info.values())) ```
a-hurst commented 3 years ago

Did some more digging tonight by modifying find_bad_by_ransac to save the correlations for each channel & 5-second window of the 007 file to a .csv so I could take a closer look. Trying to figure out where that huge jump in noise is coming from after interpolating just one channel.

Basically, the correlations for the first iteration of RANSAC are extremely high, with the lowest correlation for any given channel during any given window being 0.9972. However, POz is somehow flagged as "bad by correlation" and gets interpolated, leading to the following patterns of RANSAC correlations across the 5s chunks of the file during the second iteration (lines are mildly smoothed using loess for clarity):

Screen Shot 2020-12-08 at 12 28 53 AM

@sappelhoff @yjmantilla any ideas why the predicted vs actual correlations would go from being so consistently high across the board to being really really bad for 10-12 of the channels in a single iteration due to a single interpolation like this? I had originally figured that on the first pass some channels were near-threshold and that interpolating POz nudged a varying number of them over the edge, but apparently that's not what's happening at all. It's also weird to me how the correlations for several channels change dramatically over time, but maybe that's me not understanding something about the process.

Also, since those high correlations on the first pass are from comparing interpolated values to actual values, wouldn't we expect the interpolated POz to be highly similar to the original one? Or does the raw.interpolate_bads() method we're using to do the interpolation in reference.py use a different method of interpolation than NoisyChannels does for calculating correlations?

yjmantilla commented 3 years ago

Man I cannot reflect much about this right now because Im on finals and this requires a good analysis.

So regarding

any ideas why the predicted vs actual correlations would go from being so consistently high across the board to being really really bad for 10-12 of the channels in a single iteration due to a single interpolation like this?

At this time nope :(

Also, since those high correlations on the first pass are from comparing interpolated values to actual values, wouldn't we expect the interpolated POz to be highly similar to the original one?

I agree, It should be highly similar, I think you are isolating the heart of the issue. Im excited for this.

Or does the raw.interpolate_bads() method we're using to do the interpolation in reference.py use a different method of interpolation than NoisyChannels does for calculating correlations?

I think it actually may be different. Im not entirely sure but it seems that the interpolation that @sappelhoff did is inspired by the native mne functions (in particular _make_interpolation_matrix) , but do notice that the raw.interpolate_bads() takes into account an origin parameter that gets inferred from the info of the raw object.

Could this be the heart of the issue? That is, the mismatch between how we interpolate channels in the robust reference algorithm vs how it is interpolated inside ransac ? It could be that the difference gets higher for some files depending on that origin inferred from the Raw object and then gets escalated as the algorithm progresses. This could also explain why for some files these high numbers of bad channels at the end of pyprep dont occur.

Now, that reminds me that autoreject's ransac actually does implement directly the interpolate_bads() of the Epochs mne object instead of the isolated _make_interpolation_matrix function as we do in pyprep. That could explain why when I replaced our ransac with that ransac we got less bad channels in #36.

I think you discovered something that could be the heart of the issue!

yjmantilla commented 3 years ago

So I guess we could test that by replacing the current interpolation with the interpolation method of raw.

Basically ,

We could also imitate the procedure to get the origin parameter instead of implementing raw.interpolate_bads() directly but the advantage of the former is that it would in essence be exactly what we do outside in the robust_reference algorithm.

I may give it a try when I get out of an exam I have on Thursday, or at worst after the 18 of Dec when Im finally on vacation.

sappelhoff commented 3 years ago

Mmmh I think back then I started using _make_interpolation_matrix to speed up some part of the overall pipeline. It may indeed be a part of the issue. Private functions like that can be unstable in terms of API and behavior - we should probably switch the code to use the public functions as Jose suggests as well.

I may give it a try when I get out of an exam I have on Thursday, or at worst after the 18 of Dec when Im finally on vacation.

:rofl: that feeling when you have real-world problems to solve but instead have to perform on some exam questions that noone is gonna look at again.

a-hurst commented 3 years ago

So apparently I've been barking up the wrong tree: the reason the channels were so highly correlated by RANSAC on the first pass (but not the second) is that this data isn't referenced at all before the first RANSAC pass: because it's from a BioSemi system, all the channels are referenced to the voltages of the non-recorded CMS/DRL electrodes, meaning that you're meant to re-reference the data after recording to remove various voltage fluctuations and other noise.

Since I had no idea that CMS/DRL referencing was a thing (pretty new to EEG) and passed the data to PyPREP intact, that meant a) channels would be highly correlated before average referencing and b) bad channels would be prone to jumping up dramatically after the average referencing that happens after the first NoisyChannels run in the pipeline. If I average-reference 'sub-007' prior to the pipeline, I get a lot of channels flagged as bad on the initial pass.

This doesn't really explain variability between runs at all, but it does explain the concerning jumps in bad channels I've been seeing in this dataset. Ah well, the more you know!

yjmantilla commented 3 years ago

But it is still a mystery why is it that we get so many bad channels at the end of pyprep when compared to the matlab prep (at least for this subject). In a run I did we had 20 , while matlab shows 10 at most.

I passed the data intact too with the matlab code (as far as I know, I dont think eeglab did something behind the curtains). We should at least be getting about the same amount of bad channels as we get when using autoreject's ransac.

But I guess this is more in line with issue #49 .

sappelhoff commented 3 years ago

Oh yes ... I knew about that but somehow didn't think of it because I usually work with BrainVision data which does not use CMS/DRL. :-/

Still, the interpolation thing could/should be another culprit of unreliable behavior.

a-hurst commented 3 years ago

@sappelhoff @yjmantilla After doing a bunch of test runs yesterday, I decided to do some plotting to better visualize what's going wrong. For a comparison, I took the 10 runs of sub-007 that @yjmantilla did with MATLAB PREP last fall and compared the bad channels with the 8 runs I'd done of sub-007 with PyPREP. The y-axis is "proportion of runs where the channel was flagged as bad and interpolated", blue bars are PyPREP and red bars are Matlab PREP:

Screen Shot 2021-03-18 at 2 26 28 PM

Basically, there are a few channels where Matlab PREP and PyPREP are consistently in agreement (C6, FC6, Fp2, P6, T8), a few that Matlab PREP flags as bad more often than PyPREP (C5, FC5, O1, P8), and a whole bunch of them that PyPREP flags more often than Matlab PREP (including 19 that were never flagged once by Matlab PREP!)

Here's the same plot for channels flagged as still noisy after interpolation:

Screen Shot 2021-03-18 at 2 40 35 PM

Interestingly, it looks like Matlab PREP is no better in terms of between-run variability here.

For a different visualization, here's "proportion of runs the channel was interpolated" represented by point size, by x/y coordinates:

Screen Shot 2021-03-18 at 3 21 48 PM

It looks like both PREP implementations are flagging lots of bad channels in the lowest two rows of electrodes, but PyPREP is flagging a lot more of them, in addition to flagging ones closer to the top of the head. Any ideas on what's going on here? My initial thought is that the interpolation algorithms differ between MNE and Matlab in some important way, even though they're supposedly based on the sample equations from the same paper.

My second thought, unrelated to the issue at hand, is that RANSAC probably has an inherently higher false-positive rate for the lowest row of electrodes since the prediction quality is going to be worse for electrodes that only have other electrodes on one side. I wonder if there's any clever way of correcting for that?

yjmantilla commented 3 years ago

@a-hurst Oh no ! is the mysterious subject 007 again!

I can't think much about this right now (basically flooded with work and uni) but I think that our best hope is to really do some ransac test files that explore not only one file but a population.

In #53 there was some discussion on deprecating our implementation of ransac and to just use Autoreject's version; that because Jasmainak already did RANSAC validations against matlab for his thesis.

Regarding interpolation algorithms: I think Jasmainak once mentioned having corrected some aspect of mne's interpolation to have a better fit to matlab results (https://github.com/sappelhoff/pyprep/issues/36#issuecomment-738544167).

So in a sense our best hope to get as close to matlab's ransac is Autoreject.

Now, I was wondering from what perspective are we analyzing this variability stuff since that was corrected by introducing a seed; that is, Is the seedless RANSAC worth researching?

Regarding that last thought you had, I would guess that the boundary electrodes do present problems with interpolation.

If we are going to analyze this then my approach would be:

In any case if we take a lot of eeg files and we do indeed see that there is a tendency to mark bad electrodes at the boundary (and this can be isolated as a RANSAC issue and not some other confounding variable) then it would be worth while to see how can we correct that.

Now, how could we correct that? Im not really math-versed so the following may be non-sense (actually almost sure it is):

I think you could take a look at the Biharmonic Spline Interpolation that apparently is able to extrapolate and that could help regarding boundary electrodes (this last statement is probably wrong in many ways). References:

Now, that interpolation method is already in matlab, I dont know if PREP uses it. I dont remember how they did their interpolation.

(PD: Jasmainak also mentioned he had some ideas to improve ransac but that he never got into implementing them).

a-hurst commented 3 years ago

I think Jasmainak once mentioned having corrected some aspect of mne's interpolation to have a better fit to matlab results (#36 (comment)).

So in a sense our best hope to get as close to matlab's ransac is Autoreject.

Well, the nice thing is that with my current testing dataset from OpenNeuro I can start it before bed and it'll have completed at least 40 runs by morning, so I can easily test the Autoreject interpolation code (subbing in its interpolation code for PyPREP's) and compare it against the PyPREP and MatPREP results with the same plots as above. If I have time today I'll run that overnight and post the results!

I'm also thinking of finding another open dataset and doing the same tests against that as well: given the incorrect labelling on the channels in the BDFs, I'm worried that the authors might have gotten the BioSemi label -> 10-20 label conversion wrong in some places which is throwing off the interpretation.

Regarding that last thought you had, I would guess that the boundary electrodes do present problems with interpolation.

Thanks for your thoughts on this! I also have other projects/work to attend to, but if I can get to the bottom of the Python vs MATLAB differences in terms of RANSAC results I can start looking into that side of things after. If only I was a professor or research institute guy and could just assign a student to look into this as a summer project. So many ideas, so little time...

a-hurst commented 3 years ago

So I ran the AutoReject vs PyPREP interpolation comparison @yjmantilla suggested, substituting AR's internal _make_interpolation_matrix function in for the internal MNE function PyPREP currently uses. Note that I used different random seeds (which I probably shouldn't have done for the sake of comparison), but the results are interesting regardless. As before, this is 4 runs per id for each group on the first 10 ids of the open dataset, using the same pre-CleanLine'd .fif files as before to remove that extra variability between random seeds for RANSAC.

Basically, at least in terms of the interpolation matrix math, there seems to be little difference between PyPREP's and AutoReject's math (and yes, I did make sure that it was actually using the AR function for the second run). First plot is group means and standard errors, second plot is just a raw boxplot of the within-id mean differences:

Screen Shot 2021-03-20 at 12 33 56 PM Screen Shot 2021-03-20 at 12 34 38 PM

Note that for both the PyPREP and AutoReject RANSACs, I'm correcting the origin of the montage like MNE's interpolate_bads does so that the center of the head sphere is (0, 0, 0). Through similar testing, I've confirmed this reduces the number of initial-bad and interpolated channels by an average of ~1.5 on this data.

sappelhoff commented 3 years ago

@a-hurst one of your recent PRs probably closed this issue, right? Do you by chance remember which one was it? So we can link and then close this issue.

a-hurst commented 3 years ago

@sappelhoff The big one here is #67. Once MatPREP compatibility is all finished I think it would still be worthwhile to run some tests on between-run variability, but the core of this problem is solved with the array reshaping bugfix.

sappelhoff commented 3 years ago

Ok, I'll close this then, because the main issue is solved. Thanks