probcomp / Venturecxx

Primary implementation of the Venture probabilistic programming system
http://probcomp.csail.mit.edu/venture/
GNU General Public License v3.0
29 stars 6 forks source link

What are atoms for? #351

Open axch opened 8 years ago

axch commented 8 years ago

Venture currently has 3 unconstrained scalar types:

Some programming languages (like Javascript) only have floating point numbers, on the grounds that the 52 bits of mantissa are good enough for practical fixed-precision computations.

Other programming languages distinguish fixed point and floating point types because:

Why does Venture have atoms, and why do they masquerade as another scalar numerical type?

  1. The Chinese Restaurant Process returns atoms, where they are meant to be opaque objects that have no properties except equality, indicating grouping in the partition CRP induces. Such atoms are typically used as keys in procedures memoized by mem.
  2. The one-argument versions of categorical and dirichlet return atoms. These atoms can be used as mem keys too, but are also at least occasionally used as keys in lookup tables, where it matters that the atom carries the information of which of the categorical or dirichlet's input dimensions it actually corresponds to.
  3. make_lazy_hmm produces an SP that returns atoms, presumably to indicate the output observation, but nobody uses it.

Atoms currently look like integers when returned from API methods or printed at the Venture console. On the one hand, this causes recurring pain like #332 and #347. On the other hand, this

However, atoms emitted by different crp instances can be mixed up with each other, and that is presumably undesirable from the point of view of pattern 1.

How should things be? @vkmvkmvkmvkm ? @riastradh-probcomp, @luac ?

axch commented 8 years ago

This is topical because a recent change by @luac making atoms masquerade as integers less broke the tutorial, because it was doing use pattern 2.

axch commented 8 years ago

Proposal: break use pattern 2:

sample cluster_parameters(cluster_assignment(point_id))

instead of the currently permitted pattern

sample cluster_assignment(point_id)
sample cluster_parameters(atom<that output>)

(which of course goes terribly wrong if they forget the word atom there). Note that they still have to prepare: cluster_assignment better be memoized and predicted to have the desired effects.

axch commented 8 years ago

From conversation with @luac: Basically agreed with the proposal. Noted that the proposal can be emulated in user space by passing an explicit list of integers as the second argument; this can fix the tutorial without having to resolve the design decision.

axch commented 8 years ago

DECISION: Proceed to downgrade atoms to CRP-only, but retain them for the CRP.

Current plan:

fsaad commented 8 years ago

What about atom<1>, which is clearer than A1?

axch commented 8 years ago

No reason not to keep it as an alternative input syntax. Or would you like atoms to print as atom<1> too?

fsaad commented 8 years ago

Prefer atoms print as atom<1> which is more descriptive than capital A.

riastradh-probcomp commented 8 years ago

For the record, it is not actually necessary to box floating-point numbers. An easy approach -- taken approximately by at least one major JavaScript implementation that I saw a few years ago -- is to represent every object as a floating-point values, with objects that are not floating-point values represented by a signalling NaN, which has the effect of trapping if you attempt to do floating-point arithmetic with it. You get eleven(!) tag bits this way, and out of 64 bits, that leaves plenty of bits for pointer addresses.

axch commented 8 years ago

@fsaad Agreed, atom<1> would be fine.

fsaad commented 8 years ago

I did not know that atom<1> was valid input syntax. Do you think we should keep it? As part of system simplification, we might consider eliminating it.

axch commented 7 years ago

Re: random digits: An alternative, due to @riastradh-probcomp : Use random phrases from the diceware word list, e.g. squirrel_bandersnatch. Now they need to be distinguishable from symbols rather than numbers, but may be more fun to interact with.