joearedmond / dist-shift

Application that auto-retrains when it detects a distribution shift in production data.
0 stars 0 forks source link

dist-shift

Application that naively learns continuously in production to make a model robust to concept drift.

We are working with 2 different types of concept drift:

  1. Sudden Concept Drift When the model is trained on data that merely approximates the production data, and the production data never really resembles the train/test data.

  2. Incrementeal Concept Shift: When the function from inputs to outputs actually drifts away from what the model is trained on. In this case, the production data starts out similar to the train/test data, and then changes over time.

image image from Learning under Concept Drift: A Review

That took the form of two different synthetic datasets:

The Sudden Shift Sine dataset image

The temporal Sine Wave dataset (gradual drift) image

Our algorithms

The idea here is to continuously learn in production, whether or not a concept drift is detected.

  1. Online retraining

    We propagate the loss for a each data point seen in production image

  2. Small Batch Online Retraining

    We propogate the loss for a small batch of data seen in production. We hypothesized that this would help us be more resilient to noise in the production data. image

Results

We used the error rate at the end of the production time series to evaluate the algorithm performance. image

A very cool program by Joe Redmond and Tamanna Ananna.