mindsdb / mindsdb_native

Machine Learning in one line of code
http://mindsdb.com
GNU General Public License v3.0
37 stars 28 forks source link

Infer time series rows for streaming #484

Closed paxcema closed 3 years ago

paxcema commented 3 years ago

Why

For stream usecases, we need to predict for a datapoint that is yet to be published by the stream.

How

As detailed in the related issue M#1171, specifying make_predictions==False for the predict DataFrame will trigger the inferring policy. At the moment, it extracts the temporal data for the order_by column by comparing the last two entries in the stream cache. This could be changed to consider the complete cache, or even previous values. On the other hand, the values for the rest of the columns is carried over from the last row (here we could explore imputing techniques, basically treating the row as missing data and generating values conditioned on the cache).