Closed bluss closed 5 years ago
This is a client side work around.
#[derive(Copy, Clone, Debug)]
pub struct F32<S>(pub S);
impl<S> Sample<f32> for F32<S>
where S: Sample<f64>
{
fn sample<R>(&mut self, rng: &mut R) -> f32 where R: Rng {
self.0.sample(rng) as f32
}
}
impl<S> IndependentSample<f32> for F32<S>
where S: IndependentSample<f64>
{
fn ind_sample<R>(&self, rng: &mut R) -> f32 where R: Rng {
self.0.ind_sample(rng) as f32
}
}
I think this is doable, but would require re-implementing quite a few algorithms, e.g. the ziggurat
function is tuned specifically for f64
. @pitdicker I think it would be straightforward to add an f32
variant if we wanted to?
The real question is: is there a good reason to use f32 implementations? On GPUs, yes, but on CPUs, I'd have thought it would be better to do the computation in f64
(or even 80-bit or whatever precision the CPU supports), then to cast down afterwards is data size is an issue. This would preserve more precision though the calculation.
is there a good reason to use f32 implementations?
In principle, it should be faster, especially if the code can be vectorized. However, the current API generating one float at a time is not well suited for vectorization.
No, it's not — but nor are many applications. If performance improvements are a goal then one should start with a specific problem needing (and able to take advantage of) the optimisations.
Also, I don't expect f32
would be much faster than f64
for most tasks on modern CPUs, except with vectorisation — and excepting 32-bit ARM.
but nor are many applications.
I'm not so sure about that. I think a lot of applications could benefit from generating 4 floats at a time (for instance, rejection sampling and sampling from the unit sphere).
@pitdicker I think it would be straightforward to add an f32 variant if we wanted to?
Yes, I think so to. It would mean updating the ziggurat_tables.py
script, and probably a little creative macro use in the distributions code.
I always thought of this issue as something nice that we should do, with just the question of 'when' remaining. We definitely should figure out how to make better use of vectorisation someday though.
@fizyk20 As you are already working on/with the distributions code, what do you think? Or would you be interested?
@pitdicker I don't really have an opinion here :grin:
f32 variants don't make too much sense for Poisson and binomial distributions, which I've been involved with, as they are inherently discrete. It might make sense to make u32/u16/u8/i64/...etc. variants, though - but this would probably mean that more type annotations would be required in code using the API (harder for type inference when there are so many similar possibilities).
If one is targeting an f32
implementation of uniform or normal sampling on a GPU, it would be very useful to also have an equivalent f32
implementation on CPU for floating point reproducibility (i.e. for debugging). This does presuppose fixing the specific algorithm to sample from each distribution on both CPU and GPU, but at least on GPUs it's the parallel generation of random integers that can be a bit nontrivial to do well, whereas transforming the random integers into f32
samples from a distribution can be done the same way as however it is implemented in rand
.
So we have some motivation for this at least, though I'm still not convinced (@peterhj's case can simply use an external implementation as required).
If we do want to add this, what should support look like?
Distribution<f32>
for Normal
etc., either naively by casting the f64
implementation or with a custom implementation. The limitation here is that the distribution objects like Normal
store all parameters as f64
values, thus the distribution objects do not get any smaller and results may not be the same as with "pure f32" implementations. This does have the advantage of little extra code being required.Normal
over the floating-point type. This partly works, with some extra type-complexity (see below); unfortunately there isn't an impl of From<f64>
for f32
or any equivalent trait in the stdlib which is a slight issue (workaround: use num_traits::cast::AsPrimitive
, or maybe add our own trait, or just impl for f32
and f64
separately).pub struct Normal<FP> {
mean: FP,
std_dev: FP,
}
impl<FP> Distribution<FP> for Normal<FP>
where FP: Copy + Add<Output=FP> + Mul<Output=FP> + From<f64>
{
fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> FP {
let n = rng.sample(StandardNormal);
self.mean + self.std_dev * n.into()
}
}
The second method also has annoying error messages; we try to avoid stuff like this if we can:
error[E0599]: no method named `sample` found for type `distributions::normal::Normal<f32>` in the current scope
--> src/distributions/normal.rs:181:14
|
92 | pub struct Normal<FP> {
| --------------------- method `sample` not found for this
...
181 | norm.sample(&mut rng);
| ^^^^^^
|
= note: the method `sample` exists but the following trait bounds were not satisfied:
`distributions::normal::Normal<f32> : distributions::Distribution<_>`
= help: items from traits can only be used if the trait is implemented and in scope
= note: the following traits define an item `sample`, perhaps you need to implement one of them:
candidate #1: `distributions::uniform::UniformSampler`
candidate #2: `distributions::Sample`
candidate #3: `distributions::Distribution`
candidate #4: `Rng`
(problem here is the impl requires FP: From<f64>
, which f32
doesn't support)
A stronger motivation for this is generic code. nalgebra implements Distribution<Unit<VectorN<N, D>>> for Standard where StandardNormal: Distribution<N>
, where N
is a float-like type. It's inconvenient for this not to support f32
, and perhaps suboptimal.
It should be straightforward to impl Distribution<f32> for Standard
, StandardNormal
and Exp1
since these types are parameterless (when I say straightforward: an optimal implementation still needs the Ziggurat function reworked), but what about the parametrised versions? See my previous post; this boils down to:
Normal
and Normal32
separate typesNormal<f64>
— breaking change (workable due to #290), worse syntax, worse error messagesNormal
stores all parameters as f64
even when being used to produce f32
typesEdit: I think we could actually support both the first and last options simultaneously. That might not be the best idea, but means we could implement Distribution<f32> for Normal
and later add Normal32
if it turns out that 32-bit parameters are really needed.
But the big question here for parameterised distributions is this: do we actually need the parameters to be f32
or is it good enough to use f64
parameters everywhere? (I.e. let x: f32 = Normal::new(0f64, 1f64).sample(&mut rng);
)
Another option might be changing Distribution
to use associated types, with f64
being the default.
Using num_traits::float
might make it easier to produce generic code.
Do associated types help anywhere? We already have to deal with the fact that a distribution may implement sampling for multiple types, thus requiring:
let x = <Standard as Distribution<f32>>::sample(&Standard, &mut rng);
// but usually we just write:
let x: f32 = Standard.sample(&mut rng);
Further investigation into distributions:
Binomial
uses u64
and f64
; both types could be derived from new
args; could abstract over one or both; if we implemented Distribution<u32>
for the current Binomial
we cannot guarantee that the result is compatible with u32
output type; we may wish to use f64
internally anyway for accuracyCauchy
, Dirichlet
, Exp
, Gamma
, ChiSquared
, FisherF
, StudentT
, Beta
, Normal
, Pareto
, Pert
, Poisson
, Triangular
, Weibull
use f64
; type derivable from new
argsExp1
, StandardNormal
, UnitCircle
, UnitSphereSurface
requires no parameters and could also implement Distribution<f32>
f32
impls; for these we'll probably want a specific trait bound (alternatively specialisation might be viable, if this was ready)I guess the right approach here is to make most of these types generic over their main parameter type (f32 / f64
, or in the case of Binomial
, u32 / u64
). This causes a little breakage to usage but shouldn't be too big a deal, especially now.
Normal and so on are only implemented for f64 at the moment.
It would simplify and beautify code that integrates with Rand greatly if they could be implemented for f32 too.