Closed eliasdorneles closed 7 years ago
I was discussing this with @raphapassini and realized that it's not that simple to implement this as a filter. The filter would have to know when the input is finished or when the items limit was reached in order to know when to "flush" the reservoir.
So, this needs a bit more discussion, some possible approaches I can think of are:
The tradeoffs between the two aren't clear to me, so I'm not sure what's best.
Available now since version 0.6.13
It would be nice to have an easy way of getting random samples from an infinite amount of data, and I believe a filter implementing a reservoir sampling algorithm keeping the samples in memory would be a good enough approach for most purposes.
The samples being in memory imposes some limits to the maximum amount of samples, but this is probably okay for an initial implementation, and might even be okay for a long-lasting one. We can change it later to support persisting to disk if necessary, but I have the feeling it won't be needed. :)