Open rohitjoshi opened 6 years ago
I think we would have to dig down into these functions to find what the actual difference is coming from. In dirichlet
, is it from the as_slice
call from creating Dirichlet
vs randomkit's version that doesn't require as_slice
? Is the difference in d.sample(&mut rng)
itself? I'm less worried by the difference in sigma
since that's only 75ns difference, but since dirichlet
is almost twice as slow with a large range (+/- 302) that seems like we would be able to make some improvements more easily.
If I had to guess, it would be that the sampling function for the gamma distribution in NumPy is more optimized than ours. Either that or the fact that we are doing FP division in a loop on https://github.com/boxtown/statrs/blob/master/src/distribution/dirichlet.rs#L147 instead of multiplying by the inverse. I'll do some digging when I have the time but anyone is free to submit PRs for optimizations
Vector index is used to assign/update value. Maybe by using iterator would avoid the bound check.
https://github.com/boxtown/statrs/blob/master/src/distribution/dirichlet.rs#L143
Randomkit uses need!(alpha.iter().all(|a| *a > 0.0), "need all alpha > 0.0");
for alpha verification vs statrs
uses is_valid_multinomial()
: https://github.com/boxtown/statrs/blob/master/src/distribution/internal.rs#L4
There is definitely an opportunity for optimization by replacing vector indexing with either iterator use or unsafe indexing, although it seems like Randomkit also uses vector indexing. With regards to the verification check however, I think I'd personally prefer the stronger guarantees (i.e. no NaNs) than any possible speed ups (which I'm assuming to be pretty nominal though I could be wrong). I'd be willing to reconsider though if there were strong evidence that the check itself is the root cause of the performance difference
I replaced index with iterator but don't see much difference. Also, alpha validation shows very little improvement so seems bottleneck is elsewhere.
Another possible case is that the RNG used by Randomkit is less computationally intensive than the one from the Rust rand crate
Indeed Randomkit uses the Mersenne Twister instead of an unpredictable crypto RNG like rand
does by default. It is amazing that rand
is still competitive! If you want to compare statrs and Randomkit, you should use the same RNG. I would suggest to redo the benchmark using mersenne_twister
or by implementing rand::Rng
for randomkit::Rng
.
Thanks. I think numpy
is primarily used for ML/stats and doesn't require crypto random.
With mt19937
and sfmt
, I see significant improvement and outperforms numpy for sigma
test case but still slower for Dirichlet.
test statrs::sigma_rust ... bench: 134,685,636 ns/iter (+/- 31,823,013)
test statrs::sigma_mt19937_rust ... bench: 37,462,327 ns/iter (+/- 4,305,331)
test statrs::sigma_sfmt_rust ... bench: 35,474,413 ns/iter (+/- 9,619,130)
test randkit:::sigma_rust ... bench: 109,653,185 ns/iter (+/- 11,162,713)
test statrs::dirichlet_rust ... bench: 220,099,086 ns/iter (+/- 13,985,720)
test statrs::dirichlet_mt19937_rust ... bench: 149,702,831 ns/iter (+/- 18,926,881)
test statrs::dirichlet_sfmt_rust ... bench: 129,807,223 ns/iter (+/- 8,436,193)
test randkit::dirichlet_rust ... bench: 98,681,855 ns/iter (+/- 20,803,202)
Suggestion: why not compare instead to an optimised C library, namely the GNU Scientific Library?
There are some optimisations in the GSL not in Statrs (specifically around the Gamma distribution, which I don't understand well enough to check correctness of).
Some of the distributions are slower compared to Randomkit. Randomkit is a FFI wrapper around numpy's randomstat implementation.
e.g . See the below benchmark. Any thoughts?
statrs:
randomkit: