Proposal: Syntax suggesting "simulate from" notation

axch commented 8 years ago

e.g.

assume x ~ normal(0, 1);
assume my_normal = (foo, bar) ~> { normal(foo, bar) };

Ideas for what to do with the tilde characters?

Option: Make the first example above an alias for assume x = ... and the second for proc.
Option: That, and intend some glorious future where a static type system enforces that expressions that are random conditioned on the lexical environment use the ~ character.
Option: Make ~ be "bind in the randomness monad" and reserve = for non-binding assignment. This would change the semantics of pretty much all existing programs.
Option: Is there some hack by which = might be given a semantics as a user-level control on the granularity of dependency tracking?

To elaborate on Option 3 a bit:

assume x = normal(0, 1);

would make x be the distribution itself (as some fraction of our beginner users seem to expect). It would then be meaningful to write

assume y1 ~ x;
assume y2 ~ x;

and expect y1 and y2 to be different. In contrast,

assume x ~ normal(0, 1);

would make x be a sample from the standard normal distribution.

This style is traditionally (in programming languages) accompanied by making expression composition not mean "bind", so something like normal(0, 1) + 2 is (a priori) a type error -- trying to add the constant 2 to the standard normal distribution. We could choose to give such expressions meaning, for example by silently promoting 2 to the distribution "2 with probability 1" and defining + on distributions to distribute over sampling (i.e., "the distribution defined by drawing independent samples from the two arguments and adding them"). Would be analogous to what we have now, but may confuse some initiates who would expect + to be pointwise summation of density functions.

Thoughts? @vkmvkmvkmvkm @luac ?

riastradh-probcomp commented 8 years ago

Random thoughts on the colour of this bike shed:

observe normal(0, 1) ~> 42?

Does

assume theta ~ beta(alpha, alpha);
assume x = flip(theta);
observe x ~> 1;
observe x ~> 0;

look sensible?

I'm under the impression that it was an intentional design decision to represent distributions only by stochastic procedures, not by another kind of object with any sort of explicit sampling or observation operation. This is justified in an otherwise purely functional language because it doesn't really break referential transparency, whereas the inference language does have explicitly destructive operations on the model traces and hence warrants a more explicit monad with a more explicit distinction between bind and let.

Some type faces put ~ way above where it should be. Some type faces make it hard to distinguish ~ from -.

Deterministic function like standard math notation:

(x, y) |-> { x + y }

Stochastic procedures:

(alpha) ~> { theta ~ beta(alpha, alpha); bernoulli(theta) }
(x, y) ~> { z ~ normal(x, 0); w ~ normal(y, 0); return (z + w)/(z*w) }

axch commented 8 years ago

~, <~, and ~> are now aliases for =, <-, and ->, respectively, corresponding to Option 1. Does being done with this ticket consist of documenting that, or do we want to think about this more? One answer is "live with the new world order for a while and see."

probcomp / Venturecxx

Proposal: Syntax suggesting "simulate from" notation #569