joshday / OnlineStats.jl

⚡ Single-pass algorithms for statistics
https://joshday.github.io/OnlineStats.jl/latest/
MIT License
831 stars 62 forks source link

Integrate `StreamSampling` in `OnlineStats` #288

Closed Tortar closed 4 months ago

Tortar commented 4 months ago

thanks for the amazing package! I'd like to discuss if it would be okay to have StreamSampling.jl as a dependency. it implements reservoir-based algorithms so that all classical types of sampling are possible: (un)ordered (un)weighted with(out) replacement.

It tries to mimick your interface a lot, for example:

julia> using StreamSampling

julia> rs = ReservoirSample(Int, 5);

julia> for x in 1:100
           update!(rs, x)
       end

julia> value(rs)
5-element Vector{Int64}:
  7
  9
 20
 49
 74

I can try to prepare a PR if the proposal is welcomed. Importantly the package is under the JuliaDynamics organization so that it should be easy to find new mantainers and grant access if needed.

joshday commented 4 months ago

I think it makes more sense for ReservoirSample.jl to depend on OnlineStatsBase.jl and leave it at that.

That being said, I'd be happy to link to ReservoirSample.jl in the docstring for OnlineStats.ReservoirSample for people who need something more advanced than what we already provide.

Tortar commented 4 months ago

Seems good to me :+1:

Then after transitioning to OnlineStatsBase I will update the docstring :)