jxx123 / fireTS

A python multi-variate time series prediction library working with sklearn
MIT License
91 stars 35 forks source link

target variable is in design matrix? #4

Closed danielkentwood closed 4 years ago

danielkentwood commented 4 years ago

Thank you for sharing this code!

Apologies if I'm misunderstanding something, but in the "Basic usage" and "Grid Search" example notebooks, it appears that the CGM column has been defined as the target variable as well as one of the two features in the design matrix. This is true for both the training and test sets. Having CGM as a predictor for CGM will lead to very low error. :)

I would have fixed this myself, but in the "Basic Usage" notebook, the markdown says that the design matrix is supposed to be insulin and meal data as the exogenous inputs. It wasn't obvious which of the columns was supposed to be the meal data, and I couldn't get any clarity by looking at the code for simglucose (which looks awesome, btw).

jxx123 commented 4 years ago

Hi Daniel, NARX model uses historical CGM values as predictors (or features) to predict the future CGM values. NARX model won't take advantage of any future CGM values. However be careful that if calling predict with step > 1, the model does use the future exogenous input information to make the prediction, which is documented here https://firets.readthedocs.io/en/latest/models.html#models.NARX.predict. If you don't want to use the perfect future exogenous input information, then please use forecast method with your own future exogenous input values (X_future).

For the example in "Basic Usage", exogenous inputs are indeed insulin and meal. Assuming autoregression order 6, exogenous regression order for insulin and meal are both 6 and exogenous regression delays are both 0, the actual predictors (or features) are CGM(t), CGM(t-1), ..., CGM(t - 5), insulin(t), insulin(t-1),..., insulin(t - 5),meal(t), meal(t-1), ..., meal(t-5). The prediction target is CGM(t + 1) for NARX model. Please check the Readme https://github.com/jxx123/fireTS#nonlinear-autoregression-with-exogenous-narx-model for the more general cases.

jxx123 commented 4 years ago

Thank you for sharing this code!

Apologies if I'm misunderstanding something, but in the "Basic usage" and "Grid Search" example notebooks, it appears that the CGM column has been defined as the target variable as well as one of the two features in the design matrix. This is true for both the training and test sets. Having CGM as a predictor for CGM will lead to very low error. :)

I would have fixed this myself, but in the "Basic Usage" notebook, the markdown says that the design matrix is supposed to be insulin and meal data as the exogenous inputs. It wasn't obvious which of the columns was supposed to be the meal data, and I couldn't get any clarity by looking at the code for simglucose (which looks awesome, btw).

Ah wait, I see what you are saying now. There is a typo in the "Basic Usage" example :(. I should use 'meal' instead of 'CGM' in Xtrain and Xtest. Will fix it soon. Thanks for your comments!

danielkentwood commented 4 years ago

However be careful that if calling predict with step > 1, the model does use the future exogenous input information to make the prediction, which is documented here https://firets.readthedocs.io/en/latest/models.html#models.NARX.predict. If you don't want to use the perfect future exogenous input information, then please use forecast method with your own future exogenous input values (X_future).

I hadn't caught that distinction yet, so thanks for pointing this out.

Also, I could have been clearer in my original comment. Happy to help!