Truncated normal functions

stites commented 7 years ago

Sampling from a truncated normal distribution is a very convenient utility function found in scipy and tensorflow. I'm not sure if this monadic recursion is the best implementation, as the user can really shoot themselves in the foot, but I'd like to propose the following:

truncatedNormal, a convenience wrapper which takes a standard deviation bound as the primary method which is inspired from the fact that tensorflow uses a 2σ bound.
truncatedNormalRange Where I've neglected to add a scale parameter (seen in scipy) because I think it should be implicit by setting your own μ and σ.

Changes/input would be much appreciated.

Shimuuar commented 7 years ago

I think such function would be useful addition but this implementation uses rejection sampling which is horrendously inefficient when sampling from tails of distribution. To account for all possible uses one need more elaborate algorithm.

stites commented 7 years ago

haha, yeah, I've been in prototype mode and didn't give this too much love.

I'm seeing Chopin's algorithm and Robert's algorithm as possible implementations with this comparison of the two (as well as rejection sampling). Would you have a preference for which one(s) to implement?

Shimuuar commented 7 years ago

Sorry for the late answer. I don't know any of the algorithms so I leave choice to you

stites commented 7 years ago

Hey @Shimuuar. So I've just ported code from http://miv.u-strasbg.fr/mazet/rtnorm from - which seems to be the fastest implementation of random truncated normal variables. Unfortunately it feels like the dirtiest algorithm I've come across -- basically it's got a ton of branching into sub-algorithms (Chopin and Roberts) which are all rejection-sampling based and includes three 4000-element tables for lookups. The paper also cites "best design" choices for some hard-coded heuristics on when to switch from some algorithms to others.

So I'm very uncomfortable with this PR now and I'd like your input regarding what to do next. I could proceed with what I've currently got, test, and benchmark this code alongside the other options (following along with the comparison detailed above). Or we could section out truncated algorithms into a separate module (it feels quite ugly next to everything else, and multiway ifs aren't needed elsewhere). Finally, I am thinking I can ditch this work and work with one of the slower variants, which would be simpler (note that it would still involve rejection sampling -- it looks like the best algorithms just take advantage of sampling from inverse distributions as well).

Apologies for the block of text (and sorry for taking so long on this)!

Shimuuar commented 7 years ago

Thank you! That's a lot of work

One thing about high performance numeric code is it's ugly, very-very ugly. Magical constants, switching between different approximations... It's made uglier by fact that most language are poor at abstractions so a lot of stuff just get inlined. It could be made more readable in haskell

I'm fine with this implementation but if you want to pick different algorithm it's up to you. In any case please document it extensively. As I said numeric code tends to be ugly and unreadable so commenting almost everything is necessary
RE moving to separate module. I think that truncated normal distribution should be exported from System.Random.MWC.Distributions and and module is not big enough yet to split it. On the other hand you patch introduce enough top-level identifiers so generators for normal & truncated normal distribution could be moved to separate internal module.

stites commented 7 years ago

Thanks for the review! In that case, I'll proceed with the current algorithm and I'll try to document it heavily before I send it your way again (it might take a while to test & bench). One thing you may have missed in the review is that the tables for lookups are in System.Random.MWC.Internal and I'm not sure that's the best place for them.

RE: 2 -- I'll keep it in Distributions for now and you can make the decision to split it out when this is actually ready.

Shimuuar commented 7 years ago

One more thing Re documentation. If your implementation is based on paper please it to references section. Also I found that it's useful to add references to specific equation and/or sections of paper.

There's licensing problem with lookup tables.

Licence: GNU General Public License Version 2

This library is BSD3 so GPL code couldn't be included here. Maybe these tables should be generated algorithmically?

stites commented 7 years ago

Good catch, I'll look into it. It will be nice to generate and memoize them on the first call (since any future developer will know where the tables come from) -- the api would change, however, into a "heavy truncated normal" and a "memoized truncated normal" if you are okay with that. Otherwise... is it possible to generate the tables in Setup.hs at compile-time?

Shimuuar commented 7 years ago

Actually memoization of such lookup tables is easy. Monomorphic top-level definition will work just fine. Sometimes NOINLINE pragma is needed

In fact current lookup tables work exactly this way. They're constructed from literal list (unless optimizer/rewrite rules kick in) and result is memoized

stites commented 6 years ago

So I did a bit more digging and asked the Free Software Foundation about the code here. I would consider this PR a code conversion, so technically this is all GPL. I'm going to close this PR and I'll have to start from scratch based just on one of the papers -- if anyone else wants to take a crack at this in the meantime, feel free.

shakes fist at the GPL licence

haskell / mwc-random

Truncated normal functions #58