Closed lkorczowski closed 3 years ago
Input and decision are required for sprint2 @mesca @sylvchev @bertrandlalo
Timeflux manipulates datetime-indexed dataframes. Because sklearn expects array-like objects, we provide df.values, so I confirm that you can expect a 2D data structure of shape (n_samples, n_electrodes).
If really needed, transposition and/or dimensionality reduction should be done outside of your estimator. We could provide light-weight transformers that would be chained in a sklearn pipeline.
OK, let's try to make RASR works in Timeflux and eventually we will make a decision after detecting issues when they arise.
we decided to follow as much as possible the sklearn style (more exactly mne and pyriemann implementation) with 3D inputs. However I will keep the opportunity to input 2D matrix for now because it is not much work.
selected choice
- other tooboxes such as Matlab RASR are using (n_trials, n_samples, n_electrodes) architecture which require (theoretically) less transpose operation.
We know it is imperfect (it doesn't follow exatly sklearn API nor pyriemann).
Decision arguments:
NOTE: Decision is not yet completely fixed because of the last of clear consensus in the timeflux team.
The last time I checked, I had an error with 2D data. So either we fix this, or we completely drop 2D support. I currently use something like:
steps:
- module: timeflux.estimators.transformers.shape
class: Expand
- module: timeflux_rasr.estimation
class: RASR
args:
srate: 250
- module: timeflux.estimators.transformers.shape
class: Reduce
Expand
transforms to 3D and Reduce
to 2D. Using 2D arrays directly and bypassing the first and last step does not work.
I'm OK to drop 2D data compatibility but it will require quite a bit of code refactoring in fit() and cleaning the related input parameters. Note that 2D data compatibility was specifically made to simplified the training in fit(), without it the user will have to prepare some epochs with the right length and overlap.
Pro:
Con:
I can put that in sprint3 #15 but before doing that, I'll need the snapshot of the error, could you provide me that?
Edit 2020-02-10
Following the re-opening of this issue, the 2D compatibility will be completely drop (see https://github.com/bertrandlalo/timeflux_rasr/issues/5#issuecomment-584352525) The refactoring required consists of:
blocksize
,window_overlap
,window_len
This is extensive refactoring, the code will be much simple but we will loose few in-build functionalities of ASR() (discussed in https://github.com/bertrandlalo/timeflux_rasr/issues/5#issuecomment-584352525)
Here is a tricky discussion that may or may not be important: dimension consistency.
The problem being that all the operations are not commutative, the choice of the order of the dimension in the data variable is the key to have everything working. Unfortunately, sklearn, pyriemann and other package dimension organization aren't consistent. It may not be a problem if a dimensions are double checked carefully but it still add several steps for transpose, reshape, and permutation operations.
More precisely:
While it is not an urgent work to do, for the release, I believe that this issue should be handled correctly because rightnow it is not fully compatible with pyriemann and mne (we can't stack directly too pipelines).