nightscout / nightguard

iOS and WatchOS Client for the Nightscout CGM System
GNU Affero General Public License v3.0
225 stars 155 forks source link

Change Regression model to hardcode Exp or Linear Regression #302

Open grahamjenson opened 1 week ago

grahamjenson commented 1 week ago

Take with a grain of salt.

I was looking at the way you select the regression models for use in predictive alerts and smart snooze features and think that it is not a good idea to rank the models based on the training data used to create them. It looks like with 2 data points it will always select a Linear model, and with 3 it will always select a Quadratic. The problem is quadratic has a very bad RMSE and tendency to the extreme readings.

I wrote a post about it here were I was exploring the data to see what models were the best. https://maori.geek.nz/problems-with-predicting-blood-glucose-with-regression-571377170b8b

The good thing is that if you want to take my suggestion, it is an easy fix by changing this one line from BestMatchRegression to ExpRegression or PolynomialRegression(degree: 1). https://github.com/nightscout/nightguard/blob/ab4a3878cf68b58d504b0bd5456151a7532dc4db/nightguard/external/PredictionService.swift#L138C13-L138C52

dhermanns commented 1 week ago

Thanks for your input. @florianpreknya any opinion on that one?

poml88 commented 1 week ago

Hi, I would like to remind in this context of the issue with the time intervals of the values. For some common sensors this is 5 minutes, so using the last 2-3 values for whatever will be 10-15 minutes. Some other sensors provide a value every minute. Using 2-3 last values here is 2-3 minutes, so there might be a lot of noise. For the latter I would rather go for some average of the last 3 5 minutes interval or pick the 2-3 values by age, like 5 minutes old, 10 minutes old, ...

grahamjenson commented 1 week ago

Just out of curiosity, what sensors give minute readings?

Also, I am not sure what effect getting minute by minute readings would have on selection. I think the overall problem is that currently it is using training data to select a curve. If in 10 minutes you get 10 readings, a better option might be to select last 5 to train with, then use the previous 5 to rate the selection on, measuring based on past prediction. I think actually checking real numbers would be the only way to say for sure which strategy is best.

poml88 commented 1 week ago

Well, yes, I do not know how all this prediction works, I have not looked at that and it seems complicated. :) Maybe somebody more qualified than me could think about it...

For example the Freestyle Libre from Abbott provides minute readings.