Closed tms-bananaquit closed 1 year ago
Comments from gitlab issue:
paper here, "Challenges in Benchmarking Stream Learning Algorithms with Real-world Data".
Convert ARFF format to something we can use? https://sites.google.com/view/uspdsrepository
can consider adding a random walk / brownian noise as described in this paper. it is intended for time series data but we should be able to modify it for streaming / batch
Adding a note to myself about using toolz
or the @curry
operator to make any function I develop, able to be passed along as a pipeline. (Just an idea):
while not finished:
data = pipe(*list_of_fns, data)
list_of_fns = [join_class_function, swap_class_function]
def join_class_function():
...
Section 3 of Souza et. al. 2020 gives a good summary of potential approaches.
The "floor" could be including examples of using these methods to inject drift.
The "ceiling" would be developing independent utilities to make some of the work easier. Even if not, they may be worth making note of, in case the code-base reaches a point where replicating the examples "by hand" is annoying, e.g. pipeline-like objects.