joshday / OnlineStats.jl

⚡ Single-pass algorithms for statistics
https://joshday.github.io/OnlineStats.jl/latest/
MIT License
831 stars 62 forks source link

Group with 3 Stats not working for multi-observations? #266

Open stephancb opened 11 months ago

stephancb commented 11 months ago
julia> g=Group(HeatMap(0:10, -5:5), Hist(0:10), Hist(-5:5))
Group
├─ HeatMap: n=0 | value=(x = 0:10, y = -5:5, z = [0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0])
├─ Hist: n=0 | value=(x = 0:10, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
└─ Hist: n=0 | value=(x = -5:5, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
julia> data=[((3,5),3,5),((7,-2),7,-2)]
2-element Vector{Tuple{Tuple{Int64, Int64}, Int64, Int64}}:
 ((3, 5), 3, 5)
 ((7, -2), 7, -2)
julia> fit!(g, data)
ERROR: MethodError: no method matching isless(::Tuple{Int64, Int64}, ::Int64)
.
.
.

The combination of a heatmap with a histogram besides each the x and y axis is commonly used. It does not seem to work, or do I miss something?

joshday commented 11 months ago

Hmm, there's an error with the multi-observation method, since this appears to work:

julia> fit!(g, data[1])
Group
├─ HeatMap: n=1 | value=(x = 0:10, y = -5:5, z = [0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0])
├─ Hist: n=1 | value=(x = 0:10, y = [0, 0, 0, 1, 0, 0, 0, 0, 0, 0])
└─ Hist: n=1 | value=(x = -5:5, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1])
stephancb commented 11 months ago

Thanks, I changed the title. Then a more or less elegant workaround is

julia> fit!!(g, it) = foreach(x -> fit!(g, x), it)
fit!! (generic function with 1 method)
julia> fit!!(g, data); g
Group
├─ HeatMap: n=2 | value=(x = 0:10, y = -5:5, z = [0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0])
├─ Hist: n=2 | value=(x = 0:10, y = [0, 0, 0, 1, 0, 0, 0, 1, 0, 0])
└─ Hist: n=2 | value=(x = -5:5, y = [0, 0, 0, 1, 0, 0, 0, 0, 0, 1])
stephancb commented 11 months ago

For reference, closer to my real application is

julia> xydata(n) = ((5+randn(), randn()) for x in 1:n)
xydata (generic function with 1 method)
julia> g=Group(HeatMap(0.0:10, -5.0:5), Hist(0.0:10), Hist(-5.0:5))
Group
├─ HeatMap: n=0 | value=(x = 0.0:1.0:10.0, y = -5.0:1.0:5.0, z = [0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0])
├─ Hist: n=0 | value=(x = 0.0:1.0:10.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
└─ Hist: n=0 | value=(x = -5.0:1.0:5.0, y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
julia> fit!!(g, it) = foreach(x -> fit!(g, x), it)
fit!! (generic function with 1 method)
julia> fit!!(g, map(z -> (z, z[1], z[2]), xydata(5_000_000))); g
Group
├─ HeatMap: n=5_000_000 | value=(x = 0.0:1.0:10.0, y = -5.0:1.0:5.0, z = [0 0 … 1 0; 0 10 … 6 0; … ; 0 16 … 7 1; 0 0 … 0 0])
├─ Hist: n=5_000_000 | value=(x = 0.0:1.0:10.0, y = [151, 6528, 106846, 679123, 1705893, 1707704, 680233, 106767, 6608, 146])
└─ Hist: n=5_000_000 | value=(x = -5.0:1.0:5.0, y = [151, 6644, 106620, 679893, 1707789, 1704820, 680415, 106955, 6552, 158])

It seems a very fast method for putting a large number of x-y data into both a heatmap and histograms of x and y.