mitre / menelaus

Online and batch-based concept and data drift detection algorithms to monitor and maintain ML performance.
https://menelaus.readthedocs.io/en/latest/
Apache License 2.0
66 stars 7 forks source link

Design synthetic drift utilities #123

Closed tms-bananaquit closed 1 year ago

tms-bananaquit commented 2 years ago

Section 3 of Souza et. al. 2020 gives a good summary of potential approaches.

The "floor" could be including examples of using these methods to inject drift.

The "ceiling" would be developing independent utilities to make some of the work easier. Even if not, they may be worth making note of, in case the code-base reaches a point where replicating the examples "by hand" is annoying, e.g. pipeline-like objects.

indialindsay commented 2 years ago

Comments from gitlab issue:

  1. paper here, "Challenges in Benchmarking Stream Learning Algorithms with Real-world Data".

    • Page 20 has a list. Having a utility with a couple dials to moderate these approaches in reasonable ways, e.g. "proportion of labels flip-flopped" might be useful.
  2. Convert ARFF format to something we can use? https://sites.google.com/view/uspdsrepository

    • Forest covertype seems like a candidate for temporospatial data we could use as a test.
    • General approaches for ensembling models of different ages?
    • Conceptually, building an incremental learner on top of a "batch learner," so that the same tools and approach can be used, at least temporarily, in a context which is now recognized as streaming.
    • Need to read through section 4 more thoroughly.
  3. can consider adding a random walk / brownian noise as described in this paper. it is intended for time series data but we should be able to modify it for streaming / batch

anmol-srivastava-mitre commented 2 years ago

Adding a note to myself about using toolz or the @curry operator to make any function I develop, able to be passed along as a pipeline. (Just an idea):

while not finished:
    data = pipe(*list_of_fns, data)

list_of_fns = [join_class_function, swap_class_function]

def join_class_function():
    ...