timeflux / timeflux_rasr

Implementation of rASR filtering.
MIT License
26 stars 4 forks source link

Dimensionality consistency in RASR with sklearn API & pyriemann #5

Closed lkorczowski closed 3 years ago

lkorczowski commented 4 years ago

Edit 2020-02-10

Following the re-opening of this issue, the 2D compatibility will be completely drop (see https://github.com/bertrandlalo/timeflux_rasr/issues/5#issuecomment-584352525) The refactoring required consists of:

This is extensive refactoring, the code will be much simple but we will loose few in-build functionalities of ASR() (discussed in https://github.com/bertrandlalo/timeflux_rasr/issues/5#issuecomment-584352525)


Here is a tricky discussion that may or may not be important: dimension consistency.

The problem being that all the operations are not commutative, the choice of the order of the dimension in the data variable is the key to have everything working. Unfortunately, sklearn, pyriemann and other package dimension organization aren't consistent. It may not be a problem if a dimensions are double checked carefully but it still add several steps for transpose, reshape, and permutation operations.

More precisely:

While it is not an urgent work to do, for the release, I believe that this issue should be handled correctly because rightnow it is not fully compatible with pyriemann and mne (we can't stack directly too pipelines).

lkorczowski commented 4 years ago

Input and decision are required for sprint2 @mesca @sylvchev @bertrandlalo

mesca commented 4 years ago

Timeflux manipulates datetime-indexed dataframes. Because sklearn expects array-like objects, we provide df.values, so I confirm that you can expect a 2D data structure of shape (n_samples, n_electrodes).

If really needed, transposition and/or dimensionality reduction should be done outside of your estimator. We could provide light-weight transformers that would be chained in a sklearn pipeline.

lkorczowski commented 4 years ago

OK, let's try to make RASR works in Timeflux and eventually we will make a decision after detecting issues when they arise.

lkorczowski commented 4 years ago

we decided to follow as much as possible the sklearn style (more exactly mne and pyriemann implementation) with 3D inputs. However I will keep the opportunity to input 2D matrix for now because it is not much work.

selected choice

  • other tooboxes such as Matlab RASR are using (n_trials, n_samples, n_electrodes) architecture which require (theoretically) less transpose operation.

We know it is imperfect (it doesn't follow exatly sklearn API nor pyriemann).

Decision arguments:

NOTE: Decision is not yet completely fixed because of the last of clear consensus in the timeflux team.

mesca commented 4 years ago

The last time I checked, I had an error with 2D data. So either we fix this, or we completely drop 2D support. I currently use something like:

        steps:
          - module: timeflux.estimators.transformers.shape
            class: Expand
          - module: timeflux_rasr.estimation
            class: RASR
            args:
              srate: 250
          - module: timeflux.estimators.transformers.shape
            class: Reduce

Expand transforms to 3D and Reduce to 2D. Using 2D arrays directly and bypassing the first and last step does not work.

lkorczowski commented 4 years ago

I'm OK to drop 2D data compatibility but it will require quite a bit of code refactoring in fit() and cleaning the related input parameters. Note that 2D data compatibility was specifically made to simplified the training in fit(), without it the user will have to prepare some epochs with the right length and overlap.

Pro:

Con:

I can put that in sprint3 #15 but before doing that, I'll need the snapshot of the error, could you provide me that?