STUDY file indexing - Githubissues

arnodelorme commented 3 years ago

Would you mind:

Upload your entire study at
Describe how to reproduce the problem step by step (from loading the data to when the bug occurs) with as many details as possible.

denizdor commented 3 years ago

Hello Arnaud and Cyril,

Here I am including all the codes for creating the study and then step by step screenshots/explanations of what I am seeing regarding the indexing issue.

Uploading the data and full analysis to Google drive: 1) "test" folder: folder/filepath for original set files (there are 52 unique files but the study uses 47 of them), precomputed channel measures (per the code below) and individual subject subfolders created during the first level analysis (per the code below). 2) "test.study" : study file 3) "LIMO_test": first level, LIMO/beta group files

MATLABR2020a EEGlab2021 MacOS BigSur

1)Create STUDY using .set files (Change the path here for loading/saving) and then save the STUDY. This is the code I get from “eegh” after loading the datasets from the GUI and then creating the STUDY by File>create study> ‘using all loaded datasets’ from GUI.

EEG = pop_loadset('filename',{'P030_ATB.set' 'P029_KGL.set' 'P028_BM.set' 'P027_MB.set' 'P026_IMS.set' 'P025_GDx.set' 'P023_AJS.set' 'P021_PWH.set' 'P018_ARW.set' 'P017_JRS.set' 'P016_ODH.set' 'P015_PJJ.set' 'P014_JMG.set' 'P013_EGL.set' 'P011_EM.set' 'P010_EMO.set' 'P009_SVB.set' 'P008_SAP.set' 'P007_ARJ.set' 'P006_KBK.set' 'P005_TRH.set' 'P004_DJK.set' 'P003_TJW.set' 'P002_LMK.set' 'HC029_DJK_4bin.set' 'HC028_JaJ_4bin.set' 'HC027_JLJ_4bin.set' 'HC026_RJW_4bin.set' 'HC024_JES_4bin.set' 'HC23_KWI.set' 'HC022_JMI_4bin.set' 'HC021_DJK_4bin.set' 'HC019_HPD_4bin.set' 'HC018_SFO_4bin.set' 'HC017_LLL_4bin.set' 'HC016_TOM_4bin.set' 'HC014_WBM_4bin.set' 'HC012_CKW_4bin.set' 'HC011_HEB_4bin.set' 'HC010HKF.set' 'HC009_LPJ_4bin.set' 'HC008_TLP_4bin.set' 'HC007_ERB_4bin.set' 'HC005_GAJ_4bin.set' 'HC003_CLD_4bin.set' 'HC002_MML_4bin.set' 'HC001_KSM_4bin.set'},'filepath', 'ENTERFILEPATH ');
[ALLEEG EEG CURRENTSET] = pop_newset(ALLEEG, EEG, 0,'study',0); 
[STUDY ALLEEG] = std_editset( STUDY, ALLEEG, 'name','test','commands',{{'index' 1 'subject' '60'} {'index' 2 'subject' '59'} {'index' 3 'subject' '58'} {'index' 4 'subject' '57'} {'index' 5 'subject' '56'} {'index' 6 'subject' '55'} {'index' 7 'subject' '53'} {'index' 8 'subject' '51'} {'index' 9 'subject' '48'} {'index' 10 'subject' '47'} {'index' 1 'group' '2'} {'index' 2 'group' '2'} {'index' 3 'group' '2'} {'index' 4 'group' '2'} {'index' 5 'group' '2'} {'index' 6 'group' '2'} {'index' 7 'group' '2'} {'index' 8 'group' '2'} {'index' 9 'group' '2'} {'index' 10 'group' '2'} {'index' 11 'subject' '46'} {'index' 12 'subject' '45'} {'index' 13 'subject' '44'} {'index' 14 'subject' '43'} {'index' 15 'subject' '41'} {'index' 16 'subject' '40'} {'index' 17 'subject' '39'} {'index' 18 'subject' '38'} {'index' 19 'subject' '37'} {'index' 20 'subject' '36'} {'index' 11 'group' '2'} {'index' 12 'group' '2'} {'index' 13 'group' '2'} {'index' 14 'group' '2'} {'index' 15 'group' '2'} {'index' 16 'group' '2'} {'index' 17 'group' '2'} {'index' 18 'group' '2'} {'index' 19 'group' '2'} {'index' 20 'group' '2'} {'index' 21 'subject' '35'} {'index' 22 'subject' '34'} {'index' 23 'subject' '33'} {'index' 24 'subject' '32'} {'index' 25 'subject' '29'} {'index' 26 'subject' '28'} {'index' 27 'subject' '27'} {'index' 28 'subject' '26'} {'index' 29 'subject' '24'} {'index' 30 'subject' '23'} {'index' 21 'group' '2'} {'index' 22 'group' '2'} {'index' 23 'group' '2'} {'index' 24 'group' '2'} {'index' 25 'group' '1'} {'index' 26 'group' '1'} {'index' 27 'group' '1'} {'index' 28 'group' '1'} {'index' 29 'group' '1'} {'index' 30 'group' '1'} {'index' 31 'subject' '22'} {'index' 32 'subject' '21'} {'index' 33 'subject' '19'} {'index' 34 'subject' '18'} {'index' 35 'subject' '17'} {'index' 36 'subject' '16'} {'index' 37 'subject' '14'} {'index' 38 'subject' '12'} {'index' 39 'subject' '11'} {'index' 40 'subject' '10'} {'index' 31 'group' '1'} {'index' 32 'group' '1'} {'index' 34 'group' '1'} {'index' 33 'group' '1'} {'index' 35 'group' '1'} {'index' 36 'group' '1'} {'index' 37 'group' '1'} {'index' 38 'group' '1'} {'index' 39 'group' '1'} {'index' 40 'group' '1'} {'index' 41 'subject' '9'} {'index' 42 'subject' '8'} {'index' 43 'subject' '7'} {'index' 44 'subject' '5'} {'index' 45 'subject' '3'} {'index' 46 'subject' '2'} {'index' 47 'subject' '1'} {'index' 41 'group' '1'} {'index' 42 'group' '1'} {'index' 43 'group' '1'} {'index' 44 'group' '1'} {'index' 45 'group' '1'} {'index' 46 'group' '1'} {'index' 47 'group' '1'}},'updatedat','on','rmclust','off' );
[STUDY ALLEEG] = std_checkset(STUDY, ALLEEG);
CURRENTSTUDY = 1; EEG = ALLEEG; CURRENTSET = [1:length(EEG)];
EEG = eeg_checkset( EEG );
[STUDY EEG] = pop_savestudy( STUDY, EEG, 'filename','test.study','filepath','/ENTERFILEPATH/');
CURRENTSTUDY = 1; EEG = ALLEEG; CURRENTSET = [1:length(EEG)];

2) GUI--> Load Study from above 'test'-->Make design--> GUI--> Edit Study design 2 variables; var1: group (1-2); var2: type (10-20).

STUDY = std_makedesign(STUDY, ALLEEG, 1, 'name','STUDY.design 1','delfiles','off','defaultdesign','off','variable1','group','values1',{'1','2'},'vartype1','categorical','variable2','type','values2',{'10','20'},'vartype2','categorical','subjselect',{'1','10','11','12','14','16','17','18','19','2','21','22','23','24','26','27','28','29','3','32','33','34','35','36','37','38','39','40','41','43','44','45','46','47','48','5','51','53','55','56','57','58','59','60','7','8','9'});
[STUDY EEG] = pop_savestudy( STUDY, EEG, 'savemode','resave');

3) Precompute channel measures

[STUDY, ALLEEG] = std_precomp(STUDY, ALLEEG, {},'savetrials','on','interp','on','recompute','on','erp','on','erpparams',{'rmbase',[-100 0] },'spec','on','specparams',{'specmode','fft','logtrials','off'},'erpim','on','erpimparams',{'nlines',10,'smoothing',10});

4)Estimate channel parameters (OLS, with timelim [90 150])

pop_limo(STUDY, ALLEEG, 'method','OLS','measure','daterp','timelim',[90 150] ,'erase','on','splitreg','off','interaction','off');

5) Go to each individual subfolders located under the same path with original set files 6) For example see the LIMO.mat file under sub-60 (see image below). LIMO.data.data --> 1_limo_file_tmp1.set I know that this data belongs to my subject 1 of the STUDY (STUDY.datasetinfo.subjects = '1') because it has 125 single epoch (categorical variable) and this is the only subject with 125 epochs. The issue is that this is under sub-60 but should be under sub-01

7) When I check the other sub-folders I see for example the subfolder sub-59 has data from Study subject 10, and then subfolder sub-58 has the data from Study subject 11. (see image below).

8) It seems that the pairing follows the row (fields) order in STUDY.datasetinfo.subjects and STUDY.limo.subjects. For example row 1 in STUDY.datasetinfo.subjects is subject “60” and for this subfolder name, the data comes from the same row of the STUDY.limo.subjects which is subject 1, and then for example row 2 i in STUDY.datasetinfo.subjects is subject “59” and for this subfolder name, the data comes from the same row of the STUDY.limo.subjects which is subject 10

9)Things I have checked: a) STUDY.limo has files correctly assigned to subject numbers. Even though the order in STUDY.datasetinfo.subjects does not match the order of STUDY.limo.subjects they still refer to same data, STUDY.datasetinfo.subjects (row 1) = STUDY.limo.subjects (row 44) both of which are my subject # 60 (from the study design).

b) Limo temp files are correctly named (refers to correct .set file from the study) ie: 1_limo_filetmp1.set refers to correct .set file which is my subject 1 of the study design (independent of the filename).

c) I checked what Cyril suggested (created a STUDY with one file and gave the subject number ‘29’,) then I see that there is only one sub-folder created, named sub029 which is accurate. So when there is only a single file, it seems to be working fine. Then I tried this with 2 subjects by naming the 1st dataset as subject #2 and the second dataset as subject #1 in the STUDY design. With this I had the same problem, subfolder sub-001 has the data from subject #2 and the subfolder sub-002 has the data from subject #1. This issue does not happen if the first dataset uploaded to STUDY gets assigned to subject #1, and then the second dataset to #2, and then in an ascending order (03, 04, 05…..). However if the first dataset has a random subject number (like 29 or 60) with the following datasets also in random order I see the same issue happening.

d)When I review the group files for example: LIMO_files_Gp2_STUDY.design 1_GLM_Channels_Time_OLS.txt has file paths to the subfolders 32-60, which technically should refer to my STUDY subjects 32-60 who are in my group2, however since some of the subfolders seems to include wrong temp files, grouping would be inaccurate. This is an example of a path from LIMO_files_Gp2. Note that subfolders are created where the original set files are: test: Where the original set files are located /test/sub-60/design1_GLM_Channels_Time_OLS/LIMO.mat When I look into this specific LIMO file: LIMO.data.data I see this: 1_limo_file_tmp1.set

e) I also checked the first level LIMO pipeline design.Test is the folder with original set files. pipeline(1).import.files_in: /Users/ddc/test/1_limo_file_tmp1.set pipeline(1).import.files_out: /Users/ddc/test/sub-60/design1_GLM_Channels_Time_OLS/Yr.mat

Given above, I thought maybe (just a naïve guess) when the subfolders are created, the naming of the folders follows the order of subjects in STUDY.datasetinfo.subjects, but temp files are assigned based on the order from STUDY.limo.subjects. When there is a single file, or when the subject numbers follow an ascending order (01->02->03 within the STUDY), then this does not lead to an issue since STUDY.datasetinfo.subjects matches with STUDY.limo.subjects in these cases. I can create my study from scratch according to this, but I think being able to add/remove subjects would be challenging, also numbering my subjects according to my original dataset is really helpful when it comes to comparing EEG data with behavirol data. Again I only tried this on my data, and this might be something I am messing up, but I can’t seem to figure out.

denizdor commented 3 years ago

Hello Arnaud and Cyril,

I have updated my eeglab/limo through GitHub (via a the links you shared), and tried this again on a small subset of data. Here is what happens:

This STUDY.datasetinfo: You can see that subject '3'(first row) has a trial info 1x87:

Than when I look at Derivatives--> sub-3-->LIMO.mat--> LIMO.design.X, it is 125 which is the. number of trials from subject '1' from STUDY.datasetinfo:

When look at Derivatives--> sub-2-->LIMO.mat--> LIMO.design.X, it is 199 which is the number of trials from subject '2' from STUDY.datasetinfo, so this one is correct. When look at Derivatives--> sub-1-->LIMO.mat--> LIMO.design.X, it is 87 which is the number of trials from subject '3' from STUDY.datasetinfo, so this one is also incorrect.

*This time pipeline in and out files match, so this is not the error: pipeline(1).import.files_in: test/3_limo_file_design1_sess1.set pipeline(1).import.files_out: test/derivatives/LIMO_subj/sub-3/design1_GLM_Channels_Time_OLS/LIMO.mat

*It looks like it follows the order of STUDY.datasetinfo.index, and maybe the data from subject '3' is named as 1_limo_file_design1_sess1.set since it is index number is 1?

Thanks a lot! I can also share this small sample of the data with you if it helps! Deniz

CPernet commented 3 years ago

we need to see the screenshot of the STUDY because your file names, subject order, subject names are all different (while in BIDS it's all the same) -- which is where issue arises (because the way things get ordered inside STUDY)

sccn / eeglab

STUDY file indexing #258