Closed vandenman closed 1 year ago
A few things:
julia> typeof(Group([Mean() for _ in 1:1375]...))
Group{NTuple{1375, Mean{Float64, EqualWeight}}, Union{NTuple{1375, Number}, NamedTuple{names, R} where R<:NTuple{1375, Number}, AbstractVector{<:Number}} where names}
julia> typeof(Group([Mean() for _ in 1:1376]...))
ERROR: StackOverflowError:
Stacktrace:
[1] promote_type(::Type, ::Type, ::Type, ::Type, ::Vararg{Type}) (repeats 1368 times)
@ Base ./promotion.jl:293
[2] Group(stats::NTuple{1376, Mean{Float64, EqualWeight}})
@ OnlineStatsBase ~/.julia/dev/OnlineStatsBase/src/stats.jl:368
[3] Group(::Mean{Float64, EqualWeight}, ::Vararg{Mean{Float64, EqualWeight}})
@ OnlineStatsBase ~/.julia/dev/OnlineStatsBase/src/stats.jl:372
I am at a loss as to why 1375 would be different than 1376. However, there is a one-line fix in OnlineStatsBase that I'll add:
# don't do this. See below
julia> typeof(Group([Mean() for _ in 1:999999]...))
Group{NTuple{999999, Mean{Float64, EqualWeight}}, Union{Tuple{Number}, NamedTuple{names, R} where R<:Tuple{Number}, AbstractVector{<:Number}} where names}
Group
, use a Vector
:julia> typeof(Group([Mean() for _ in 1:999999]))
Group{Vector{Mean{Float64, EqualWeight}}, Union{Tuple{Number}, NamedTuple{names, R} where R<:Tuple{Number}, AbstractVector{<:Number}} where names}
I'll have to do some benchmarking. There may be no benefit for using tuples even for a smaller number of stats.
Variance
, don't add in a Mean
A Variance
needs to calculate the mean internally, so you're adding unnecessary compute by including a Mean
as well. You can do:
o = fit!(Variance(), randn(100))
mean(o)
Eh, I lied. My "fix" broke other stuff. This may be some internal Julia limitation on tuple sizes. I'll look into it.
Figured it out. This is the culprit, which lives in Base:
promote_type(T, S, U, V...) = (@inline; promote_type(T, promote_type(S, U, V...)))
For a large number of arguments, e.g. promote_type(many_things...)
, this method is called over and over so we hit a stack overflow. Fortunately, changing some OnlineStatsBase code from promote_type(types...)
to reduce(promote_type, types)
fixes everything.
Re: Vectors vs. Tuples as the Group
container: Tuples are faster for a small number of items.
new OnlineStatsBase release is pending.
Related to https://github.com/joshday/OnlineStats.jl/issues/158
I'm doing some MCMC where there are too many parameters to save all samples. So instead I figured, let's use OnlineStats to store only the few statistics that I'm interested in (e.g., Mean, Variance, AutoCorrelation). However, it is unclear to me how to do this properly. Right now, I have a function like so:
The idea is that a user (just me for now) supplies a
Series
which gets passed toonline_statistics
. Internally, I run some MCMC algorith and after each iteration I update the statistics fora
,b
, andc
. This way, a user can specify which statistics to track themselves and they're not baked into the code.Example usage:
works fine. However, for a larger size, this same fails with a StackOverflowError:
I think the problem is that
Group
does not specialize when the contents are all the same. For exampleGroup(Mean(), Mean())
has typeGroup{Tuple{Mean{Float64, EqualWeight}, Mean{Float64, EqualWeight}}, Union{Tuple{Number, Number}, NamedTuple{names, R} where R<:Tuple{Number, Number}, AbstractVector{<:Number}} where names}
andGroup(Mean(), Mean(), Mean())
has an additional, Mean{Float64, EqualWeight}
. Eventually, a StackOverflow is reached.For now, I can use a
Vector{T} Where {T<:OnlineStat}
, but I feel like I'm reinventing the idea behindGroup
. Perhaps there should be a specialized type for this case, likeMonoGroup{T, U<:Int} where {T<:Union{Series, OnlineStat}}
?