Implement distributions for f32

bluss commented 8 years ago

Normal and so on are only implemented for f64 at the moment.

It would simplify and beautify code that integrates with Rand greatly if they could be implemented for f32 too.

bluss commented 8 years ago

This is a client side work around.

#[derive(Copy, Clone, Debug)]
pub struct F32<S>(pub S);

impl<S> Sample<f32> for F32<S>
    where S: Sample<f64>
{
    fn sample<R>(&mut self, rng: &mut R) -> f32 where R: Rng {
        self.0.sample(rng) as f32
    }
}

impl<S> IndependentSample<f32> for F32<S>
    where S: IndependentSample<f64>
{
    fn ind_sample<R>(&self, rng: &mut R) -> f32 where R: Rng {
        self.0.ind_sample(rng) as f32
    }
}

dhardy commented 6 years ago

I think this is doable, but would require re-implementing quite a few algorithms, e.g. the ziggurat function is tuned specifically for f64. @pitdicker I think it would be straightforward to add an f32 variant if we wanted to?

The real question is: is there a good reason to use f32 implementations? On GPUs, yes, but on CPUs, I'd have thought it would be better to do the computation in f64 (or even 80-bit or whatever precision the CPU supports), then to cast down afterwards is data size is an issue. This would preserve more precision though the calculation.

vks commented 6 years ago

is there a good reason to use f32 implementations?

In principle, it should be faster, especially if the code can be vectorized. However, the current API generating one float at a time is not well suited for vectorization.

dhardy commented 6 years ago

No, it's not — but nor are many applications. If performance improvements are a goal then one should start with a specific problem needing (and able to take advantage of) the optimisations.

Also, I don't expect f32 would be much faster than f64 for most tasks on modern CPUs, except with vectorisation — and excepting 32-bit ARM.

vks commented 6 years ago

but nor are many applications.

I'm not so sure about that. I think a lot of applications could benefit from generating 4 floats at a time (for instance, rejection sampling and sampling from the unit sphere).

pitdicker commented 6 years ago

@pitdicker I think it would be straightforward to add an f32 variant if we wanted to?

Yes, I think so to. It would mean updating the ziggurat_tables.py script, and probably a little creative macro use in the distributions code.

I always thought of this issue as something nice that we should do, with just the question of 'when' remaining. We definitely should figure out how to make better use of vectorisation someday though.

@fizyk20 As you are already working on/with the distributions code, what do you think? Or would you be interested?

fizyk20 commented 6 years ago

@pitdicker I don't really have an opinion here :grin:

f32 variants don't make too much sense for Poisson and binomial distributions, which I've been involved with, as they are inherently discrete. It might make sense to make u32/u16/u8/i64/...etc. variants, though - but this would probably mean that more type annotations would be required in code using the API (harder for type inference when there are so many similar possibilities).

peterhj commented 6 years ago

If one is targeting an f32 implementation of uniform or normal sampling on a GPU, it would be very useful to also have an equivalent f32 implementation on CPU for floating point reproducibility (i.e. for debugging). This does presuppose fixing the specific algorithm to sample from each distribution on both CPU and GPU, but at least on GPUs it's the parallel generation of random integers that can be a bit nontrivial to do well, whereas transforming the random integers into f32 samples from a distribution can be done the same way as however it is implemented in rand.

dhardy commented 6 years ago

So we have some motivation for this at least, though I'm still not convinced (@peterhj's case can simply use an external implementation as required).

If we do want to add this, what should support look like?

We can simply implement Distribution<f32> for Normal etc., either naively by casting the f64 implementation or with a custom implementation. The limitation here is that the distribution objects like Normal store all parameters as f64 values, thus the distribution objects do not get any smaller and results may not be the same as with "pure f32" implementations. This does have the advantage of little extra code being required.
We could template types like Normal over the floating-point type. This partly works, with some extra type-complexity (see below); unfortunately there isn't an impl of From<f64> for f32 or any equivalent trait in the stdlib which is a slight issue (workaround: use num_traits::cast::AsPrimitive, or maybe add our own trait, or just impl for f32 and f64 separately).

pub struct Normal<FP> {
    mean: FP,
    std_dev: FP,
}

impl<FP> Distribution<FP> for Normal<FP>
where FP: Copy + Add<Output=FP> + Mul<Output=FP> + From<f64>
{
    fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> FP {
        let n = rng.sample(StandardNormal);
        self.mean + self.std_dev * n.into()
    }
}

The second method also has annoying error messages; we try to avoid stuff like this if we can:

error[E0599]: no method named `sample` found for type `distributions::normal::Normal<f32>` in the current scope
   --> src/distributions/normal.rs:181:14
    |
92  | pub struct Normal<FP> {
    | --------------------- method `sample` not found for this
...
181 |         norm.sample(&mut rng);
    |              ^^^^^^
    |
    = note: the method `sample` exists but the following trait bounds were not satisfied:
            `distributions::normal::Normal<f32> : distributions::Distribution<_>`
    = help: items from traits can only be used if the trait is implemented and in scope
    = note: the following traits define an item `sample`, perhaps you need to implement one of them:
            candidate #1: `distributions::uniform::UniformSampler`
            candidate #2: `distributions::Sample`
            candidate #3: `distributions::Distribution`
            candidate #4: `Rng`

(problem here is the impl requires FP: From<f64>, which f32 doesn't support)

Ralith commented 6 years ago

A stronger motivation for this is generic code. nalgebra implements Distribution<Unit<VectorN<N, D>>> for Standard where StandardNormal: Distribution<N>, where N is a float-like type. It's inconvenient for this not to support f32, and perhaps suboptimal.

dhardy commented 6 years ago

It should be straightforward to impl Distribution<f32> for Standard, StandardNormal and Exp1 since these types are parameterless (when I say straightforward: an optimal implementation still needs the Ziggurat function reworked), but what about the parametrised versions? See my previous post; this boils down to:

Normal and Normal32 separate types
Normal<f64> — breaking change (workable due to #290), worse syntax, worse error messages
Normal stores all parameters as f64 even when being used to produce f32 types

Edit: I think we could actually support both the first and last options simultaneously. That might not be the best idea, but means we could implement Distribution<f32> for Normal and later add Normal32 if it turns out that 32-bit parameters are really needed.

But the big question here for parameterised distributions is this: do we actually need the parameters to be f32 or is it good enough to use f64 parameters everywhere? (I.e. let x: f32 = Normal::new(0f64, 1f64).sample(&mut rng);)

vks commented 5 years ago

Another option might be changing Distribution to use associated types, with f64 being the default.

Using num_traits::float might make it easier to produce generic code.

dhardy commented 5 years ago

Do associated types help anywhere? We already have to deal with the fact that a distribution may implement sampling for multiple types, thus requiring:

let x = <Standard as Distribution<f32>>::sample(&Standard, &mut rng);
// but usually we just write:
let x: f32 = Standard.sample(&mut rng);

Further investigation into distributions:

Binomial uses u64 and f64; both types could be derived from new args; could abstract over one or both; if we implemented Distribution<u32> for the current Binomial we cannot guarantee that the result is compatible with u32 output type; we may wish to use f64 internally anyway for accuracy
Cauchy, Dirichlet, Exp, Gamma, ChiSquared, FisherF, StudentT, Beta, Normal, Pareto, Pert, Poisson, Triangular, Weibull use f64; type derivable from new args
Exp1, StandardNormal, UnitCircle, UnitSphereSurface requires no parameters and could also implement Distribution<f32>
some algorithms may want optimisations for f32 impls; for these we'll probably want a specific trait bound (alternatively specialisation might be viable, if this was ready)

I guess the right approach here is to make most of these types generic over their main parameter type (f32 / f64, or in the case of Binomial, u32 / u64). This causes a little breakage to usage but shouldn't be too big a deal, especially now.

rust-random / rand

Implement distributions for f32 #100