This adds two main classes - SyntheticStream and ClassSampler. Example use-cases are implemented in tests.
ClassSampler
This is a building block of SyntheticStream. It works with python next() interface to provide easy way to iterate over samples. There are two ways the sampler will stop iterating:
In constructor we defined max_samples argument, and next sample will overload this limit
In such case, ClassSampler raises StopIteration exception, which is compliant with Python iter iterface
to check weather calling next() will raise StopIteration (this is used inside SyntheticStream), ClassSampler has end_of_iteration property, which returns bool value
There are no more samples to get, and ClassSampler was initiated with eoc_strategy equal to raise
In such case, ClassSampler with raise EndOfClassError
This behavior is not handled inside SyntheticStream - it should be user responsibility to make sure class samples will not end in experiment
SyntheticStream
It works as a aggregation of ClassSampler. Every class in the stream should have its own unique sampler (there can be no active duplicates - to avoid undefined behavior, but we can "hook" 2 samplers from 2 classes in different stream positions)
For every sampler, we specify weight_func(t) which returns importance of a class in stream at moment t.
Between every iteration we can check normalized probabilities of all classes in the stream (past, current and future - from t point of view) using class_probabilitites property. It is calculated inside next() transition, so output is result from last transition - i.e. accessing it after first iteration would give probabilities at time 0.
Future improvements
Currently ClassSampler loads samples as a python list which could lead to problems with huge datasets
Written with flexibility in mind
This adds two main classes -
SyntheticStream
andClassSampler
. Example use-cases are implemented in tests.ClassSampler
This is a building block of SyntheticStream. It works with python
next()
interface to provide easy way to iterate over samples. There are two ways the sampler will stop iterating:max_samples
argument, and next sample will overload this limitClassSampler
raisesStopIteration
exception, which is compliant with Pythoniter
iterfacenext()
will raise StopIteration (this is used inside SyntheticStream), ClassSampler hasend_of_iteration
property, which returnsbool
valueClassSampler
was initiated witheoc_strategy
equal toraise
ClassSampler
with raiseEndOfClassError
SyntheticStream
- it should be user responsibility to make sure class samples will not end in experimentSyntheticStream
It works as a aggregation of
ClassSampler
. Every class in the stream should have its own unique sampler (there can be no active duplicates - to avoid undefined behavior, but we can "hook" 2 samplers from 2 classes in different stream positions)For every sampler, we specify
weight_func(t)
which returns importance of a class in stream at moment t.Between every iteration we can check normalized probabilities of all classes in the stream (past, current and future - from t point of view) using
class_probabilitites
property. It is calculated insidenext()
transition, so output is result from last transition - i.e. accessing it after first iteration would give probabilities at time0
.Future improvements
ClassSampler
loads samples as a pythonlist
which could lead to problems with huge datasets