rswc / ml-ids

Code for our bachelor's thesis, "Intusion Detection in Imbalanced and Evolving Data Streams"
MIT License
1 stars 0 forks source link

Add SyntheticStream and ClassSampler for experiments reproduction #10

Closed desecnd closed 10 months ago

desecnd commented 10 months ago

Written with flexibility in mind

This adds two main classes - SyntheticStream and ClassSampler. Example use-cases are implemented in tests.

ClassSampler

This is a building block of SyntheticStream. It works with python next() interface to provide easy way to iterate over samples. There are two ways the sampler will stop iterating:

SyntheticStream

It works as a aggregation of ClassSampler. Every class in the stream should have its own unique sampler (there can be no active duplicates - to avoid undefined behavior, but we can "hook" 2 samplers from 2 classes in different stream positions)

For every sampler, we specify weight_func(t) which returns importance of a class in stream at moment t.

Between every iteration we can check normalized probabilities of all classes in the stream (past, current and future - from t point of view) using class_probabilitites property. It is calculated inside next() transition, so output is result from last transition - i.e. accessing it after first iteration would give probabilities at time 0.

Future improvements

desecnd commented 10 months ago

Addressed all change suggestions, if all is good - I think it is ready to merge 🔧