sccn / clean_rawdata

Cleaning Raw EEG data
GNU General Public License v3.0
42 stars 17 forks source link

Different results with repeated runs of clean_rawdata #37

Open d-scanzi opened 2 years ago

d-scanzi commented 2 years ago

Hi there,

I just followed up on this thread (https://sccn.ucsd.edu/pipermail/eeglablist/2020/015456.html) I found on the eeglablist about inconsistent results when using clean_artifatcs for bad channel detection and removal. I encounter the same problem. After running quite a few tests with a couple of different datasets and different parameters I think I observed the following. Results are inconsistent when:

  1. The recording is short (less than 10 minutes recorded at 1000Hz before downsampling at 250Hz). So, in general, when the data has less than 150.000 samples. The less the number of samples, the more inconsistent the results are.
  2. The cut-off for the low-pass filter is 0.1 (as used in ERP research - I did not test with lower than this).

In my testing, these two conditions needed to be met in order to find inconsistent results. Furthermore, they were related, as the lower the number of samples, the higher the cut-off necessary to create inconsistent results. I think the problem arises from the use of rand() in clean_channels called in the background. Setting a seed create stable results, but it is difficult to tell which channels are actually bad.

I do understand this is a problem that not many people might run into given these conditions but it could be helpful to have a warning appearing (or simply being displayed on the console) in case someone is trying to process a short dataset

Proposed solution

Add a warning when calling clean_artifacts or pop_clean_rawdata (as it uses the former) if the length of EEG.data is shorter than N (I would say 150.000, that is 10minutes sampled/downsampled at 250Hz). A simple implementation would be:

if length(EEG.data) < 150000
    warning('clean_artifacts: The dataset is rather short. It is possible to encounter inconsistent results for the detection of bad channels. Try a few times to verify this. ')
end

This for both the function (maybe in pop_clean_rawdata) the message could be displayed in a window (as it is likely that people are using the GUI in that case).

Thank you,

Daniele

arnodelorme commented 2 years ago

We will look into it. As of recent revision, the results should be identical.

d-scanzi commented 2 years ago

Hi @arnodelorme, following the eeglablist thread, here is a detailed account of what I tried and observed.

First observation

First, I tried to run the following pipeline (load data, add channel location, downsample to 250Hz, high-pass filter with Basic fir-filter new at 0.1Hz, low-pass filter with the same filter at 30Hz, run clean_rawdata). I added the code below. Note that I am using an EGI system, thus I am loading the corresponding channel locations, declaring Cz as the reference, removing the fiducial channels (step 2).

If I re-run the entire pipeline, then I observe inconsistent results in the channels detected as noisy. As described above, I think this is due to the length of the data. I tried with a longer dataset, by trimming it down at different lengths. This is how I came up with the threshold of 150000 sample points. Sadly, I am not the owner of the long dataset, so I cannot share it as this was not covered in the ethics when the experiment was run. However, I will share the shorter dataset. I will record a longer dataset in the following days, so I will test that and share it if necessary.

%% 1. Start EEGLAB and load unprocessed dataset (Common to both iterations)
[ALLEEG EEG CURRENTSET ALLCOM] = eeglab;
EEG = pop_loadset('filename',injectedFile,'filepath',injectedDir);

%% 2. Add channel location 
EEG           = pop_chanedit(EEG, 'load',{GSNchanloc,'filetype','sfp'}, ...
    'changefield',{132,'datachan',0},'changefield',{132,'datachan',1},'changefield',{132,'datachan',0}); % Remove Cz from channel array (along with fiducial channels)

%% 3. Downsample 
EEG           = pop_resample(EEG, dwnSampFreq);

%% 4a. Highpass filter
EEG      = pop_eegfiltnew(EEG, 'locutoff', 0.1, 'plotfreqz', 1);      %0.1 high-pass filter on original data

EEG      = pop_eegfiltnew(EEG, 'hicutoff',30,'plotfreqz',1);         %30Hz low-pass filter on original data

%% Detecting Bad Channels
% Initialise dataset to use for channel removal - copy of EEG
EEGremoved   = EEG;

% Remove channels from copy of current dataset (copyData or EEG)
[EEGremoved,~,~,removedChannels] = clean_artifacts(EEGremoved, 'FlatlineCriterion',5,'ChannelCriterion',0.75,'LineNoiseCriterion',4, ...
        'Highpass','off','BurstCriterion','off','WindowCriterion','off','BurstRejection','off','Distance','Euclidian');

% Obtain removed channels IDs and their total number
removedChannelIds = find(removedChannels == 1);
removedChannelNum = length(removedChannelIds); 

Second observation

I was following the tests you uploaded here So, mimicking that structure, I tried to run multiple tests in a loop. Surprisingly, if I run clean_rawdata on the same dataset without loading it each time, then the results are stable. The code is as above, however, the last part is changed to the following loop:


%Store results
removed = struct('Nremoved',[],'Idx',[] );

for n=1:10

    % Create copy of dataset
    EEGremoved = EEG;

    % Remove channels from copy of current dataset (copyData or EEG)
    [EEGremoved,~,~,removedChannels] = clean_artifacts(EEGremoved, 'FlatlineCriterion',5,'ChannelCriterion',0.75,'LineNoiseCriterion',4, ...
        'Highpass','off','BurstCriterion','off','WindowCriterion','off','BurstRejection','off','Distance','Euclidian');
    % Obtain removed channels IDs and their total number
    removedChannelIds = find(removedChannels == 1);
    removedChannelNum = length(removedChannelIds); 

    removed(n).Nremoved = removedChannelNum;
    removed(n).Idx      = removedChannelIds;

end

Conclusions

Unless I am doing something wrong (please forgive me if this is the case), the problem arises when the dataset is loaded and processed but not when the function is called on a pre-loaded dataset.

Attachments

I have attached the shorter dataset with the correct channel locations, but otherwise unprocessed. The file is available here

Hope this makes sense,

Thank you,

Daniele

arnodelorme commented 2 years ago

Will try it out. Makoto can you confirm the problem as well? Cheers,

Arno

On Mar 9, 2022, at 3:38 PM, d-scanzi @.***> wrote:

Hi @arnodelorme, following the eeglablist thread, here is a detailed account of what I tried and observed.

First observation

First, I tried to run the following pipeline (load data, add channel location, downsample to 250Hz, high-pass filter with Basic fir-filter new at 0.1Hz, low-pass filter with the same filter at 30Hz, run clean_rawdata). I added the code below. Note that I am using an EGI system, thus I am loading the corresponding channel locations, declaring Cz as the reference, removing the fiducial channels (step 2).

If I re-run the entire pipeline, then I observe inconsistent results in the channels detected as noisy. As described above, I think this is due to the length of the data. I tried with a longer dataset, by trimming it down at different lengths. This is how I came up with the threshold of 150000 sample points. Sadly, I am not the owner of the long dataset, so I cannot share it as this was not covered in the ethics when the experiment was run. However, I will share the shorter dataset. I will record a longer dataset in the following days, so I will test that and share it if necessary.

%% 1. Start EEGLAB and load unprocessed dataset (Common to both iterations)

[ ALLEEG EEG CURRENTSET ALLCOM] = eeglab ; EEG = pop_loadset('filename',injectedFile,'filepath',injectedDir );

%% 2. Add channel location

EEG
= pop_chanedit(EEG, 'load',{GSNchanloc,'filetype','sfp' }, ...

'changefield',{132,'datachan',0},'changefield',{132,'datachan',1},'changefield',{132,'datachan',0}); % Remove Cz from channel array (along with fiducial channels)

%% 3. Downsample

EEG
= pop_resample(EEG, dwnSampFreq );

%% 4a. Highpass filter

EEG
= pop_eegfiltnew(EEG, 'locutoff', 0.1, 'plotfreqz', 1); %0.1 high-pass filter on original data

EEG
= pop_eegfiltnew(EEG, 'hicutoff',30,'plotfreqz',1); %30Hz low-pass filter on original data

%% Detecting Bad Channels % Initialise dataset to use for channel removal - copy of EEG

EEGremoved
= EEG ;

% Remove channels from copy of current dataset (copyData or EEG)

[ EEGremoved,~,~,removedChannels] = clean_artifacts(EEGremoved, 'FlatlineCriterion',5,'ChannelCriterion',0.75,'LineNoiseCriterion',4 , ...

'Highpass','off','BurstCriterion','off','WindowCriterion','off','BurstRejection','off','Distance','Euclidian' );

% Obtain removed channels IDs and their total number

removedChannelIds = find(removedChannels == 1 ); removedChannelNum = length(removedChannelIds); Second observation

I was following the tests you uploaded here So, mimicking that structure, I tried to run multiple tests in a loop. Surprisingly, if I run clean_rawdata on the same dataset without loading it each time, then the results are stable. The code is as above, however, the last part is changed to the following loop:

%Store results

removed = struct('Nremoved',[],'Idx' ,[] );

for n=1:10

% Create copy of dataset

EEGremoved 

= EEG ;

% Remove channels from copy of current dataset (copyData or EEG)

[

EEGremoved,~,~,removedChannels] = clean_artifacts(EEGremoved, 'FlatlineCriterion',5,'ChannelCriterion',0.75,'LineNoiseCriterion',4 , ...

'Highpass','off','BurstCriterion','off','WindowCriterion','off','BurstRejection','off','Distance','Euclidian' );

% Obtain removed channels IDs and their total number

removedChannelIds 

= find(removedChannels == 1 ); removedChannelNum = length(removedChannelIds );

removed(n).Nremoved = removedChannelNum ;

removed(n).Idx = removedChannelIds ;

end Conclusions

Unless I am doing something wrong (please forgive me if this is the case), the problem arises when the dataset is loaded and processed but not when the function is called on a pre-loaded dataset.

Attachments

I have attached the shorter dataset with the correct channel locations, but otherwise unprocessed. The file is available here

Hope this makes sense,

Thank you,

Daniele

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.

arnodelorme commented 2 years ago

I'll need time to get back to this issue, sorry.

Makoto

On Wed, Mar 9, 2022 at 3:42 PM Arnaud Delorme @.***> wrote:

Will try it out. Makoto can you confirm the problem as well? Cheers,

Arno

On Mar 9, 2022, at 3:38 PM, d-scanzi @.***> wrote:

Hi @arnodelorme, following the eeglablist thread, here is a detailed account of what I tried and observed.

First observation

First, I tried to run the following pipeline (load data, add channel location, downsample to 250Hz, high-pass filter with Basic fir-filter new at 0.1Hz, low-pass filter with the same filter at 30Hz, run clean_rawdata). I added the code below. Note that I am using an EGI system, thus I am loading the corresponding channel locations, declaring Cz as the reference, removing the fiducial channels (step 2).

If I re-run the entire pipeline, then I observe inconsistent results in the channels detected as noisy. As described above, I think this is due to the length of the data. I tried with a longer dataset, by trimming it down at different lengths. This is how I came up with the threshold of 150000 sample points. Sadly, I am not the owner of the long dataset, so I cannot share it as this was not covered in the ethics when the experiment was run. However, I will share the shorter dataset. I will record a longer dataset in the following days, so I will test that and share it if necessary.

%% 1. Start EEGLAB and load unprocessed dataset (Common to both iterations)

[ ALLEEG EEG CURRENTSET ALLCOM] = eeglab ; EEG = pop_loadset('filename',injectedFile,'filepath',injectedDir );

%% 2. Add channel location

EEG = pop_chanedit(EEG, 'load',{GSNchanloc,'filetype','sfp' }, ...

'changefield',{132,'datachan',0},'changefield',{132,'datachan',1},'changefield',{132,'datachan',0}); % Remove Cz from channel array (along with fiducial channels)

%% 3. Downsample

EEG = pop_resample(EEG, dwnSampFreq );

%% 4a. Highpass filter

EEG = pop_eegfiltnew(EEG, 'locutoff', 0.1, 'plotfreqz', 1); %0.1 high-pass filter on original data

EEG = pop_eegfiltnew(EEG, 'hicutoff',30,'plotfreqz',1); %30Hz low-pass filter on original data

%% Detecting Bad Channels % Initialise dataset to use for channel removal - copy of EEG

EEGremoved = EEG ;

% Remove channels from copy of current dataset (copyData or EEG)

[ EEGremoved,~,~,removedChannels] = clean_artifacts(EEGremoved, 'FlatlineCriterion',5,'ChannelCriterion',0.75,'LineNoiseCriterion',4 , ...

'Highpass','off','BurstCriterion','off','WindowCriterion','off','BurstRejection','off','Distance','Euclidian' );

% Obtain removed channels IDs and their total number

removedChannelIds = find(removedChannels == 1 ); removedChannelNum = length(removedChannelIds); Second observation

I was following the tests you uploaded here So, mimicking that structure, I tried to run multiple tests in a loop. Surprisingly, if I run clean_rawdata on the same dataset without loading it each time, then the results are stable. The code is as above, however, the last part is changed to the following loop:

%Store results

removed = struct('Nremoved',[],'Idx' ,[] );

for n=1:10

% Create copy of dataset

EEGremoved

= EEG ;

% Remove channels from copy of current dataset (copyData or EEG)

[

EEGremoved,~,~,removedChannels] = clean_artifacts(EEGremoved, 'FlatlineCriterion',5,'ChannelCriterion',0.75,'LineNoiseCriterion',4 , ...

'Highpass','off','BurstCriterion','off','WindowCriterion','off','BurstRejection','off','Distance','Euclidian' );

% Obtain removed channels IDs and their total number

removedChannelIds

= find(removedChannels == 1 ); removedChannelNum = length(removedChannelIds );

removed(n).Nremoved = removedChannelNum ;

removed(n).Idx = removedChannelIds ;

end Conclusions

Unless I am doing something wrong (please forgive me if this is the case), the problem arises when the dataset is loaded and processed but not when the function is called on a pre-loaded dataset.

Attachments

I have attached the shorter dataset with the correct channel locations, but otherwise unprocessed. The file is available here

Hope this makes sense,

Thank you,

Daniele

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.