scrapinghub / exporters

Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations
BSD 3-Clause "New" or "Revised" License
40 stars 10 forks source link

Add reservoir sampling option to writers #342

Closed lpawluczuk closed 7 years ago

lpawluczuk commented 7 years ago

This is implementation of reservoir sampling for stream of items on writers level.

It addresses this issue: https://github.com/scrapinghub/exporters/issues/317

kalessin commented 7 years ago

I am having problems reviewing this PR. It has too many changes in a big commit and I don't really see the reason of many of the changes (not only the ones I commented, but also many of the rest). I think they are complicating unnecessarily the code. I was thinking on something much more simple. I think we don't need to change so many things to achieve the same.

If you create a new PR with only one commit which only adds the reservoir sampling class (and another that adds tests for the isolated class, it should be possible), I can do the changes needed in the exporters for fitting on it. It should be possible to do things much more simple.

kalessin commented 7 years ago

don't remove this PR, I will use it for reference

kalessin commented 7 years ago

I will proceed in the inverse way. I will provide the basis for adding custom write buffers and then you create a PR for the reservoir sampling.

kalessin commented 7 years ago

This PR is old and has been replaced by other ones