Closed adknudson closed 4 months ago
I see that CountMap
has this implemented with the fit!(o, y=>n)
notation. Is it reasonable to expand this to all stats?
It's a reasonable request for fit!(::OnlineStat{T}, x::Pair{T, Int})
methods. I'd be okay with a fallback method and specific types can implement their own methods as needed.
I'm looking through OnlineStatsBase
to get an understanding of what I would need to change to get this working. It looks like for OnlineStat{T}
, T
is the type of observation that the stat can work on. So for CountMap
, it is defined as
CountMap{T, A <: AbstractDict{T, Int}} <: OnlineStat{Union{T, Pair{<:T,<:Integer}}}
which means it can accept a T
, or a Pair{T, Int}
. Is it really necessary for Pair{T,Int}
to be part of the type? I can go two ways with a PR:
Pair{T,Int}
to all stats and implement _fit!(o, xy::Pair)
for each statfit!(o::OnlineStat{T}, xy::Pair{<:T, <Integer})
and the fallback method _fit!(o::OnlineStat{T}, xy::Pair{<:T, <Integer})
which can be specialized for certain stats (e.g. Hist
)I'm inclined towards option 2 as then there is no ambiguity of if a stat can fit a pair-type observation. Generally the new code would be
fit!(o::OnlineStat{T}, xy::Pair{<:T, <:Integer}) where {T} = (_fit!(o, xy); return o)
function _fit!(o::OnlineStat{T}, xy::Pair{<:T, <:Integer}) where {T}
x, y = xy
for _ in 1:y
_fit!(o, x)
end
return o
end
I think CountMap could either keep or drop the Pair{<:T,<:Integer}
in the union type. I think it would be cleaner to drop it, but I'm not sure if that counts as a breaking change.
Is it really necessary for Pair{T,Int} to be part of the type?
So now I'm remembering why I haven't done this...
fit!(o::OnlineStat{T}, x::S)
will iterate through x
and fit!
each element, so adding pair methods is a breaking change, e.g. fit!(Mean(), 1 => 5)
would work differently. Previously this would result in value=3 and nobs=2. Adding this new method would result in value=1 and nobs=5.
I think we need a new function to indicate counts, fitn!
or similar.
In retrospect, building fit!
around iteration was a mistake. OnlineStats 2.0 should have a push!
/append!
-like approach to make things more explicit.
fit!(Mean(), 1 => 5)
feels more like a bug than anything. Instead of introducing fitn!
, why not dispatch on fit!(o, x, k)
? That shouldn't be breaking and is easy to fall back on.
It would be really convenient to be allowed to call
fit!(o, y, k)
as a shorthand way to fit observationy
k
times. Many stats would support this with little change, like theHist
stat for example. You would change lines 94-101into
I'm not sure how much extra work this would be, but I would be happy to draft a pull request trying to implement this if there is interest.