bug in `flag_epoch_ic_sd_1`

scott-huberty commented 1 year ago

The purpose of both pipeline.flag_ch_sd and pipeline.flag_epoch_ic_sd1 is to flag epochs where too many channels/IC's are "unlike themselves". i.e. have standard deviations that are above a threshold.

Yet....

in flag_ch_sd:

we calculate the standard deviation across the time dimension (line 887).

That leaves us with an array size n_channels, n_epochs.

Which we pass into marks_array2flags (line 897). https://github.com/lina-usc/pylossless/blob/4c3ae7108da82a1985f4548394d472dbbfb7db37/pylossless/pipeline.py#L883-L897

in flag_epoch_ic_sd1:

we call variability_across_epochs, and inside that function we calculate the standard deivation across the epoch dimension: https://github.com/lina-usc/pylossless/blob/4c3ae7108da82a1985f4548394d472dbbfb7db37/pylossless/pipeline.py#L1046-L1060

https://github.com/lina-usc/pylossless/blob/4c3ae7108da82a1985f4548394d472dbbfb7db37/pylossless/pipeline.py#L358-L359

that leaves us with an array of n_ics, n_times - which we pass into marks_array2flags (line 1058).

That doesn't make sense to me. As I brought up in #45 , if we want to flag epochs , we need to preserve the epochs dimension.

The current code in variability_across_epochs would suggest that we want to flag ics the same way that we do for channels (The channel is above a quantile threshold in too many epochs.... I don't think the MATLAB Lossless pipeline flags IC's this way, it only flags them based off a reliable IC label that is non-brain... But maybe @jadesjardins remembers better).

scott-huberty commented 1 year ago

BTW this is related to #53

scott-huberty commented 1 year ago

Also - the only other time we call variability_across_epochs is in step 1: flag_outlier_chs, and this method doesn't even use marks_array2flags, which further suggests to me that what I pointed out in flag_epoch_ic_sd1 is a bug:

https://github.com/lina-usc/pylossless/blob/4c3ae7108da82a1985f4548394d472dbbfb7db37/pylossless/pipeline.py#L849-L881

jadesjardins commented 1 year ago

In the Matlab Lossless pipeline ICs are not flagged in the same way that channels are, you are correct. Marking time intervals based on IC time series information should be as follows... give an 'IC x time x epoch' matrix, calculate the SD on the time dimension which returns a 'IC x epoch' matrix. The criteria function then uses this 2D matrix to determine which epochs have too many ICs that have unusual SD values.

scott-huberty commented 1 year ago

thank you, @jadesjardins !!!

I'm going to push some fixes to #53

BTW, I think we have been meaning ping you to discuss the pipeline (i.e. we are unsure what to do in the python version about flagging IC's, because we are not running the final ICA 3 times). Maybe we can discuss more in our all tomorrow.

christian-oreilly commented 1 year ago

In the Matlab Lossless pipeline ICs are not flagged in the same way that channels are, you are correct. Marking time intervals based on IC time series information should be as follows... give an 'IC x time x epoch' matrix, calculate the SD on the time dimension which returns a 'IC x epoch' matrix. The criteria function then uses this 2D matrix to determine which epochs have too many ICs that have unusual SD values.

Seems compatible with our doc https://github.com/lina-usc/pylossless/blob/main/docs/source/_images/pipeline_step_12.png Let us know @jadesjardins if you see incompatibilities between what the pipeline is meant to do and what is documented in this page https://pylossless.readthedocs.io/en/latest/implementation.html as these visual representations of each pipeline step are now the official "specs" for the pipeline (i.e., the code should be changed to follow the specs, and never the other way around unless we purposefully decide to change the specs and update these visual representations accordingly)

lina-usc / pylossless

bug in `flag_epoch_ic_sd_1` #54