timeflux / timeflux_rasr

Implementation of rASR filtering.
MIT License
26 stars 4 forks source link

An error occured while fitting: Maximum allowed size exceeded #11

Closed lkorczowski closed 4 years ago

lkorczowski commented 4 years ago
/Users/mesca/Documents/Dev/neurogateway/py/timeflux_rasr/timeflux_rasr/estimation.py:396: RuntimeWarning: divide by zero encountered in true_divide
  indx = np.arange(lower_min, lower_min + max_dropout_fraction_n + 1e-15, step_sizes_n[0]).astype(int)  # epochs start
2020-01-09 18:31:26,665 ERROR      ml           10020    Main             An error occured while fitting: Maximum allowed size exceeded
2020-01-09 18:31:26,666 DEBUG      ml           10020    Main
Traceback (most recent call last):
  File "/Users/mesca/Documents/Dev/neurogateway/py/timeflux_core/timeflux/helpers/background.py", line 133, in execute
    result = getattr(data['instance'], data['method'])(*data['args'], **data['kwargs'])
  File "/Users/mesca/anaconda3/envs/timeflux/lib/python3.7/site-packages/sklearn/pipeline.py", line 352, in fit
    Xt, fit_params = self._fit(X, y, **fit_params)
  File "/Users/mesca/anaconda3/envs/timeflux/lib/python3.7/site-packages/sklearn/pipeline.py", line 317, in _fit
    **fit_params_steps[name])
  File "/Users/mesca/anaconda3/envs/timeflux/lib/python3.7/site-packages/joblib/memory.py", line 355, in __call__
    return self.func(*args, **kwargs)
  File "/Users/mesca/anaconda3/envs/timeflux/lib/python3.7/site-packages/sklearn/pipeline.py", line 716, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/Users/mesca/Documents/Dev/neurogateway/py/timeflux_rasr/timeflux_rasr/estimation.py", line 271, in fit_transform
    self.fit(X, y)
  File "/Users/mesca/Documents/Dev/neurogateway/py/timeflux_rasr/timeflux_rasr/estimation.py", line 190, in fit
    dist_params[c, :] = _fit_eeg_distribution(rms_sliding[:, c])
  File "/Users/mesca/Documents/Dev/neurogateway/py/timeflux_rasr/timeflux_rasr/estimation.py", line 396, in _fit_eeg_distribution
    indx = np.arange(lower_min, lower_min + max_dropout_fraction_n + 1e-15, step_sizes_n[0]).astype(int)  # epochs start
lkorczowski commented 4 years ago

The input's length of _fit_eeg_distributionshould be at least several hundreds of samples (several seconds) to correctly estimate the distribution. I am adding an assert and the corresponding pytest. It will be release with other code reviews in a PR.

lkorczowski commented 4 years ago

FYI, I found out that the matlab code doesn't check that either but because their behaviour is different it doesn't raise any error, it's worse. Instead, when the input is too short, the grid-search is made several time on exactly same windows. It shouldn't happens too often but it may happens during testing and if someone try to fit the data with too few samples.

lkorczowski commented 4 years ago

this issue should has been fixed, I will close it when the PR #12 is validated.

lkorczowski commented 4 years ago

12 has been canceled. please see PR #17 instead.

lkorczowski commented 4 years ago

@bertrandlalo mentioned similar issue in #19 that should have been fixed by #21 because the user couldn't change some of the _fit_eeg_distributionparameters from RASR. Anyway, I think it is a bad pratice to allow the user to change all _fit_eeg_distributionparameters as doing so can break RASR (poor eeg distribution estimation). What garanty that RASR works is that there is hundreds of epochs in training (with overlap).

If the decision in #5 to remove 2D compatibility, it will be the job of the user to give hundreds of epochs for training (user has to do the epoching on they own).

I'm closing this issue.

In sprint3 #15, I'm adding that the documentation and assert should warn users to give more epochs for training.