mitre / menelaus

Online and batch-based concept and data drift detection algorithms to monitor and maintain ML performance.
https://menelaus.readthedocs.io/en/latest/
Apache License 2.0
66 stars 7 forks source link

Design `nlp` submodule and skeleton of drift detection methods #152

Open Anmol-Srivastava opened 1 year ago

Anmol-Srivastava commented 1 year ago

Task

NLP-based drift detection algorithms do not always fit into data-drift or concept-drift definitions, so a separate submodule can be made and a basic skeleton of a language or text-based algorithm can be made.

Impact

This makes implementing specific algorithms later on easier.

Anmol-Srivastava commented 1 year ago

Worth thinking about returning to the pipeline idea:

class NLPMethod():
    def run():
        self = pipe(self, *self.operators)

n = NLPMethod(operators=[sklearn.some_preprocessor, transformers.some_transformer, some_encoder, some_evaluator])
Anmol-Srivastava commented 1 year ago

Also worth exploring multi-threading / HPC / GPU compatibility here. If adopting a pipeline approach, we may have several operators applied to the same data at a given stage, which is a good opportunity to demonstrate potential performance enhancements. We can use MD3 as a starting point

anmol-srivastava-mitre commented 1 year ago

Also worth looking at iterators

anmol-srivastava-mitre commented 1 year ago
class FreeDetector():
     def step(inputs):
         data = pipe(*self.data, some_operators)
         state = # ... pipeline of operators e.g. divergence metrics ...
         self.state = state

    def run():
        while data:
            self.step()
anmol-srivastava-mitre commented 1 year ago

The above can help simplify a joint interface for batch vs. stream data, and can be made relevant for NLP and other methods