Open dhardy opened 6 years ago
I should CC @sbarral and link the C++ implementation
Also see the following 2018 paper arxiv by Morteza Jalalvand, Mohammad A. Charsooghi:
Generalized Ziggurat Algorithm for Unimodal and Unbounded Probability Density Functions with Zest
Authors' code (GPLv3): https://github.com/DiscreteLogarithm/Zest
I'm looking into this. I don't speak C++ but I think my prompting skills are good enough to get the reference implementation interpreted into Rust for us. @dhardy, I believe the Ziggurat method is currently only used to sample normal and exponential distributions, no? I see that the ETF algorithm can sample arbitrarily shaped distributions, does that mean it could be used in any/all of rand's distributions? I'm trying to get a better understanding of the scope of applicability of this solution vs the existing one.
Oh that's funny, I had already forgotten about my Rust implementation!
I see that my last work on that dates back to 4 years ago. I see it's missing a README and license, but otherwise looks fairly complete (actually more than the C++ version), with a decent battery of tests. I think my ambition was to implement more distributions before publishing it, but there was always something more important to do... Anyway, rather than let it rot for ever, I have just committed everything that wasn't, and made the repo public:
https://github.com/sbarral/etf-rs
I am very busy this week but will try to put some finishing touches in the coming month. In the meantime, the license is MIT/APACHE2, so feel free to peruse.
As for its strengths, I think ETF is great if you have untypical/custom distributions for which no literature exist. For well-known distributions, ~chances are that some specialized sampling methods will be a bit faster (though ETF will probably be competitive)~ [probably not true, see next comment]. I seem to recall, however, that Cauchy and Chi-Squared were at that time faster than in Rand (TBC).
Compared to Zest, I expect that ETF will always be measurably faster. Due to the way it works, Zest is likely to have a very high branch misprediction rate. ETF is in this respect similar to regular Ziggurat, with a very good prediction rate.
I remember now that the problem is not so much that ETF might be slower than specialized sampling method; in fact it is probably often significantly faster. The issue is that for most distributions, the tables cannot be pre-computed so constructing a distribution is a costly operation. Cauchy is good in this respect because, like the normal or exponential distributions, it can be generated for any parameter using a single table.
Bottom line: ETF is I think a very valuable algorithm, but the construction cost may not make it a good fit for Rand (and Zest will have the exact same issue).
I just updated the Rand-related dependencies and ran the benchmark:
Test | type | Time (rand_distr ) |
Time (etf-rs ) |
---|---|---|---|
central_normal | f32 |
5.0159 ns | 3.8457 ns |
central_normal | f64 |
3.9653 ns | 3.9148 ns |
normal | f64 |
6.2197 ns | 4.2361 ns |
cauchy | f32 |
26.118 ns | 4.0772 ns |
cauchy | f64 |
21.983 ns | 3.8984 ns |
chi_squared (k=0.5) | f32 |
29.742 ns | 7.8440 ns |
chi_squared (k=0.5) | f64 |
41.105 ns | 10.746 ns |
chi_squared (k=2) | f32 |
7.7833 ns | 4.2518 ns |
chi_squared (k=2) | f64 |
6.3638 ns | 3.7355 ns |
chi_squared (k=2) | f32 |
7.7833 ns | 4.2518 ns |
chi_squared (k=2) | f64 |
6.3638 ns | 3.7355 ns |
chi_squared (k=5) | f32 |
16.573 ns | 4.2282 ns |
chi_squared (k=5) | f64 |
15.904 ns | 3.9329 ns |
chi_squared (k=1000) | f32 |
16.355 ns | 4.0622 ns |
chi_squared (k=1000) | f64 |
15.495 ns | 3.9862 ns |
Cauchy seems to be the obvious candidate for an ETF implementation since it is about 5-6 times faster than today's rand_distr::Cauchy
and it can be generated from a single pre-computed table, just like rand_distr::Normal
is today (it will be in such case slightly slower than in the above benchmark, but the relative difference should be the same as rand_distr::Normal
vs rand_distr::StandardNormal
).
Chi-squared is much faster than rand_distr::ChiSquared
, but AFAIR it cannot be generated from a single pre-computed table.
Wow, that's fantastic!
Bottom line: ETF is I think a very valuable algorithm, but the construction cost may not make it a good fit for Rand (and Zest will have the exact same issue).
I guess it depends on the use-case, right? If you only sample a distribution once before destroying it then ETF might not be worth it, but the construction cost gets amortized in the long-term. I think it would be very cool if we could let users choose which sampling method to use via a generic parameter.
the construction cost gets amortized in the long-term
Yes, this is absolutely true. The current algorithm for table construction is decently optimized, so the break-even is probably low enough to make it worthwhile in most applications (I'd hazard that break-even is of the order of ~10'000 samples). I was mainly concerned that it may not be a good default as users may not expect a constructor to be expensive.
I think it would be very cool if we could let users choose which sampling method to use via a generic parameter
This could be a great solution. I guess the new
constructor would have to be changed to some other name to get the ETF version then? Another solution that could be considered is a submodule containing the alternative implementations.
What's the compiled code size like for these? We have some pre-compiled tables for Ziggurat; adding more isn't a blocker but might be a reason to publish elsewhere.
What's the size of distribution objects? Presumably, anything not using a pre-compiled table is going to be large.
It sounds like sample performance, set-up performance, object size and possibly even code size could all be reasons for picking one implementation over another. Possibly also variance in the number of RNG samples required (if more predictable for one algorithm over another).
This could be a great solution. I guess the new constructor would have to be changed to some other name to get the ETF version then? Another solution that could be considered is a submodule containing the alternative implementations.
It does sound like there is good reason for publishing multiple variants of these distributions. We previously discussed similar questions (#494, #601), but so far WeightedIndex
is the only distribution with multiple implementations and we just went with different names.
So...
rand_distr::etf
sub-module. This may be appropriate if several distributions have similar performance/cost tradeoffs (sample time, set-up time, size).rand_etf
or etf
instead. This may be preferable if code size is significant. It avoids needing to synchronise with rand_distr
releases. (It might also prompt the question of whether the rest of rand_distr
should be re-organised, but we'll leave this for now.)rand_distr::fast
, maybe moving WeightedAliasIndex
there too, but if I've learnt anything from the latter, its that the choice of algorithm has consequences beyond just sample/set-up performance trade-offs: it can result in API changes (bounds on generics, additional methods) and does affect reproducibility. Thus, I have a preference for reflecting the algorithm in the name.I agree that we should switch to ETF for the distributions where the table can be precomputed. For the others, it's a bit more tricky: So far, we don't offer the same distribution with different performance tradeoffs. There is not just the construction time vs. sample time tradeoff; a precision vs. time tradeoff came up as well in the past (#531, #1346).
Am Di., 9. Apr. 2024 um 19:31 Uhr schrieb Serge Barral < @.***>:
the construction cost gets amortized in the long-term
Yes, this is absolutely true. The current algorithm for table construction is decently optimized, so the break-even is probably low enough to make it worthwhile in most applications (I'd hazard that break-even is of the order of ~10'000 samples). I was mainly concerned that it may not be a good default as users may not expect a constructor to be expensive.
I think it would be very cool if we could let users choose which sampling method to use via a generic parameter
This could be a great solution. I guess the new constructor would have to be changed to some other name to get the ETF version then? Another solution that could be considered is a submodule containing the alternative implementations.
— Reply to this email directly, view it on GitHub https://github.com/rust-random/rand/issues/257#issuecomment-2045748077, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAIFNH6AUYVWBYN5TRYRATY4QQYXAVCNFSM4EQH7Y22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBUGU3TIOBQG43Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>
What's the compiled code size like for these? We have some pre-compiled tables for Ziggurat; adding more isn't a blocker but might be a reason to publish elsewhere.
What's the size of distribution objects? Presumably, anything not using a pre-compiled table is going to be large.
For Cauchy and Normal, the tables of the above benchmark stored 256x3=762 elements of 32 bits (for f32
) or 64 bits (for f64
) each.
For chi-squared, the tables had a size of 512x3.
From memory, Rand uses tables of size 256x2=512 for the normal distribution.
The size vs speed trade-off can be adjusted of course. I just ran a benchmark with tables of size 128x3=384 for Cauchy and the numbers were ~4.9ns, so about 20-25% slower than the previous ~4ns.
The size of distribution objects is basically the tables, so if they are pre-computed they are very small, and otherwise very large...
Possibly also variance in the number of RNG samples required (if more predictable for one algorithm over another).
ETF is similar to ziggurat and will need only 1 RNG per sample like 95-99% of the time, and very rarely more than 2 (but in theory, and also like ziggurat, the maximum number of samples is unbounded).
We could publish under something like rand_distr::fast, maybe moving WeightedAliasIndex there too, but if I've learnt anything from the latter, its that the choice of algorithm has consequences beyond just sample/set-up performance trade-offs: it can result in API changes (bounds on generics, additional methods) and does affect reproducibility. Thus, I have a preference for reflecting the algorithm in the name.
One thing to consider indeed is that when the table is computed at construction, the constructor is fallible because the solver may fail to converge for some extreme choices of parameters.
There is not just the construction time vs. sample time tradeoff; a precision vs. time tradeoff came up as well in the past (#531, #1346).
The linked issues are for uniform sampling, which is a bit specific. But for other distributions, I would argue that inverse transform sampling is the gold standard in the sense that it "makes the most" of a random number and, crucially, avoids clustering samples on the fine scale.
From this perspective, ETF is almost as good as it gets. It is, in fact, an inverse transform sampling method, excepts that it samples an upper bound of the PDF made of rectangles rather than the actual PDF. Its only compromise is that it uses "only" 95-99% of the entropy of the random number for sampling, the remaing part being used for the rejection step.
Here is a plot comparing the quality of ETF and ziggurat with the same source of entropy.
Then I agree with @vks: where pre-computed tables are possible, we should likely switch to ETF.
Then I agree with @vks: where pre-computed tables are possible, we should likely switch to ETF.
It looks like Gumbel could be another candidate beside Cauchy. I can try to implement it in my repo and will post the results here (may take a few days as I have other urgent things).
I have added Gumbel to the repo. Here are the results of cargo bench
on my machine using the same conditions as the benchmark reported earlier for other distributions.
Test | type | Time (rand_distr) | Time (etf-rs) |
---|---|---|---|
gumbel | f32 | 13.872 ns | 4.6761 ns |
gumbel | f64 | 19.719 ns | 4.5018 ns |
Bear in mind that here as well the tables were generated dynamically for the specific distribution parameters and their size was 256x3. Using a common table (location=0 and scale=1) necessitates an extra addition and multiplication per sample, but even with a common table and a small size (128x3), the numbers are still very decent: 5.4829 ns for f32
and 5.8653 ns for f64
.
See: The Exclusive Top Floor algorithm