nilsnevertree / kalman-reconstruction-partially-observed-systems

Data-driven Reconstruction of Partially Observed Dynamical Systems using Kalman Algorithms in an itterative way
GNU General Public License v3.0
1 stars 0 forks source link

``Kalman_SEM()`` can't handle missing values in input arrays. #27

Open nilsnevertree opened 1 year ago

nilsnevertree commented 1 year ago

It seems the function can not handle missing (np.nan) values in the input array. This problem is originated in the sklearn.LineraRegression() function.

nilsnevertree commented 1 year ago

Using kalman.Kalman_SEM() with np.nan values gives this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
    150 # loop on the SEM iterations
    151 for i in tqdm(np.arange(0, nb_iter_SEM)):
    152     # Kalman parameters
--> 153     reg = LinearRegression(fit_intercept=False).fit(x_out[:-1,], x_out[1:,])
    154     M = reg.coef_
    155     Q = np.cov((x_out[1:,] - reg.predict(x_out[:-1,])).T)

ValueError: Input X contains NaN.
LinearRegression does not accept missing values encoded as NaN natively. 
For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. 
Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. 
See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values