Open axch opened 8 years ago
This is topical because a recent change by @luac making atoms masquerade as integers less broke the tutorial, because it was doing use pattern 2.
Proposal: break use pattern 2:
categorical
, dirichlet
, and make_lazy_hmm
to return integers, namely the index in their weight argument of the weight chosen (or the index in the observation matrix of the observation).sample cluster_parameters(cluster_assignment(point_id))
instead of the currently permitted pattern
sample cluster_assignment(point_id)
sample cluster_parameters(atom<that output>)
(which of course goes terribly wrong if they forget the word atom
there). Note that they still have to prepare: cluster_assignment
better be memoized and predicted to have the desired effects.
From conversation with @luac: Basically agreed with the proposal. Noted that the proposal can be emulated in user space by passing an explicit list of integers as the second argument; this can fix the tutorial without having to resolve the design decision.
DECISION: Proceed to downgrade atoms to CRP-only, but retain them for the CRP.
Current plan:
categorical
to return an Integermake_*_dir_cat
to return and Integermake_lazy_hmm
to return an IntegerA
) [Edit: Or by symmetry with the parser as atom<N>
]strip_types
to produce it from the stack dict!)What about atom<1>
, which is clearer than A1
?
No reason not to keep it as an alternative input syntax. Or would you like atoms to print as atom<1> too?
Prefer atoms print as atom<1>
which is more descriptive than capital A
.
For the record, it is not actually necessary to box floating-point numbers. An easy approach -- taken approximately by at least one major JavaScript implementation that I saw a few years ago -- is to represent every object as a floating-point values, with objects that are not floating-point values represented by a signalling NaN, which has the effect of trapping if you attempt to do floating-point arithmetic with it. You get eleven(!) tag bits this way, and out of 64 bits, that leaves plenty of bits for pointer addresses.
@fsaad Agreed, atom<1>
would be fine.
I did not know that atom<1>
was valid input syntax. Do you think we should keep it? As part of system simplification, we might consider eliminating it.
Re: random digits: An alternative, due to @riastradh-probcomp : Use random phrases from the diceware word list, e.g. squirrel_bandersnatch
. Now they need to be distinguishable from symbols rather than numbers, but may be more fun to interact with.
Venture currently has 3 unconstrained scalar types:
Number
, represented as a floating point numberInteger
, represented as a fixed-point integerAtom
, represented as a fixed-point integerSome programming languages (like Javascript) only have floating point numbers, on the grounds that the 52 bits of mantissa are good enough for practical fixed-precision computations.
Other programming languages distinguish fixed point and floating point types because:
Why does Venture have atoms, and why do they masquerade as another scalar numerical type?
mem
.categorical
anddirichlet
return atoms. These atoms can be used asmem
keys too, but are also at least occasionally used as keys in lookup tables, where it matters that the atom carries the information of which of thecategorical
ordirichlet
's input dimensions it actually corresponds to.make_lazy_hmm
produces an SP that returns atoms, presumably to indicate the output observation, but nobody uses it.Atoms currently look like integers when returned from API methods or printed at the Venture console. On the one hand, this causes recurring pain like #332 and #347. On the other hand, this
However, atoms emitted by different crp instances can be mixed up with each other, and that is presumably undesirable from the point of view of pattern 1.
How should things be? @vkmvkmvkmvkm ? @riastradh-probcomp, @luac ?