Open DominiqueMakowski opened 1 week ago
The benchmarks below show that the mixture model is about 6 times slower than the wald model due to sampling the drift rate, and extra terms. On a development branch I show that condionally executing the wald code when eta is zero achieves the desired speed up without time instability problems.
My plan is to drop WaldMixture in favor of a general Wald model. I will not merge this until later. I would prefer to make some breaking changes in bulk. In the meantime, you can just use WaldMixture with the parametric constraint that eta = 0 to achieve the special case.
using BenchmarkTools
using SequentialSamplingModels
wald_mixture = WaldMixture(;ν=3.0, η=.2, α=.5, τ=.130)
wald = Wald(;ν=3.0, α=.5, τ=.130)
@benchmark rand($wald_mixture, $1000)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 110.102 μs … 1.529 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 111.269 μs ┊ GC (median): 0.00%
Time (mean ± σ): 116.936 μs ± 20.719 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▇▂▂▁ ▁ ▁ ▁
███████▆▇██▇▇█▆▆▆██▇▇█▇▆▅▇▇█▇▆▆▆▆▅▇█▆▆▆▆▆▇▆▆▆▆▇▇▇▇▅▆▇▇▇▇▆▆▆▆ █
110 μs Histogram: log(frequency) by time 178 μs <
Memory estimate: 7.94 KiB, allocs estimate: 1.
@benchmark rand($wald, $1000)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 19.509 μs … 361.751 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 20.727 μs ┊ GC (median): 0.00%
Time (mean ± σ): 21.309 μs ± 4.358 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▄▆██▇▆▄▂ ▂
██████████▇▇▆▆▆▆▆▆▆▄▅▅▅▃▅▇▇███▇▆▆▄▅▅▇▇█▇▇▇▇▇▇▇▆▅▄▃▁▄▃▁▃▃▅▃▄▄ █
19.5 μs Histogram: log(frequency) by time 35.7 μs <
Memory estimate: 15.88 KiB, allocs estimate: 2.
rts = rand(wald_mixture, 1000)
@benchmark logpdf.($wald_mixture, $rts)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 114.675 μs … 1.301 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 114.859 μs ┊ GC (median): 0.00%
Time (mean ± σ): 119.654 μs ± 18.190 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▂ ▁ ▁ ▁
███▇▅▆▆▇██▇▇▇▆▅▆██▇▇▇▆▆▆▇▇▆▇▇▆▅▆█▇▆▆▆▅▅▄▆▇▆▆▆▅▄▄▅▅▅▄▄▃▃▄▃▄▅▄ █
115 μs Histogram: log(frequency) by time 185 μs <
Memory estimate: 7.94 KiB, allocs estimate: 1.
rts = rand(wald, 1000)
@benchmark logpdf.($wald, $rts)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 17.573 μs … 321.755 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 17.740 μs ┊ GC (median): 0.00%
Time (mean ± σ): 18.959 μs ± 5.142 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▅▂ ▁ ▁▁ ▁
████▅▅▆▆▅▅▅▅▄▄▅▅▄▄▄▅▅▅▃▁▃▆██████▇▅▅▅▅▅▆▆▇▇▇▆▅▅▆▅▅▄▄▄▄▆▅▆▇▇▆▅ █
17.6 μs Histogram: log(frequency) by time 38.1 μs <
Memory estimate: 7.94 KiB, allocs estimate: 1.
Two more semi-related things:
rand(ExGaussian(0.3, 0.2, 0),
rand(ExGaussian(0.3, 0.0, 0.1), 100)
- Falling back on Normal() if tau=0
That is a good idea. I'll make that change.
- but it still has some variability. Maybe something to clarify as it's a bit unexpected
I think this behavior is to be expected. X ~ normal(3, 0) + exponential(.10) simplifies to a shifted exponential where the var(X) = .10:
using SequentialSamplingModels
var(rand(ExGaussian(0.3, 0.0, 0.1), 10_000))
0.010246564388110698
I think this behavior is to be expected
Indeed that makes sense, that it would still have variability from the exp distrib
I'll have to think about circumventing the error when tau = 0. Upon further thought, I think it might not be a good idea because the expontial distribution is not defined when tau = 0, which, I think, would imply the exguassian is not defined. Along the same lines, the logpdf is not defined with tau = 0 because it is used as a divisor. Right now I think the current implementation is correct.
which, I think, would imply the exguassian is not defined
Although one could argue that if the exponential is "null" then only the Normal remains for Normal + Exp.
At the occasion we can check with other implementation see how they do (I just tried to test using brms
but it doesn't give me access to the distribution constructor)
I verified with gamlss in R:
> dexGAUS(1, mu = 1, sigma = 1, nu = 0, log = FALSE)
Error in dexGAUS(1, mu = 1, sigma = 1, nu = 0, log = FALSE) :
nu must be greater than 0
> rexGAUS(1, mu = 1, sigma = 1, nu = 0, log = FALSE)
Error in rexGAUS(1, mu = 1, sigma = 1, nu = 0, log = FALSE) :
nu must be positive
New issue to track and discuss the streamlining of the Wald API (merging of Wald and Mixture Wald) (https://github.com/itsdfish/SequentialSamplingModels.jl/issues/83#issuecomment-2212388196)