Open NunoSempere opened 1 year ago
Still doesn't work correctly even with #1591, unfortunately.
I'd like to flag this as pretty important.
I previously thought that it's not a big deal, because who cares if it's exactly zero or 1e-16.
But then I tried to use mx
to simulate multiple discrete branches, and got confused why my code wasn't working properly.
Yea, that makes sense. I'd imagine that this specific issue isn't dramatically difficult? Maybe 3-20 hours, could be worth it soon.
Incidentally, the approach that squiggle.c takes here is, I think, more robust, so that these things don't happen, because it defines variables as samplers, which are conceptually clearer.
In fake pseudo-code, it is doing something like:
let a = () => 0
let b = () => sample_from_normal(1, 10)
let mixture = (samplers, weights) => {
let normalized_cumsummed_weights = cumsum(weights) / sum(weights)
let p = Math.random()
let i = findFirstIndex(normalized_cumsummed_weights, x => p < x) || samplers.length
return samplers[i]()
}
let resutlSampler = mixture([a,b], [1, 9])
let result = Array.from({length: 10000}, _ => resultSampler())
console.log(result)
You get more verbosity, but the type of objects which a and b are is a bit clearer, and there is no difference between a sampler which samples from discrete vs continuous distribution.
I don't know to what extent you will find this useful, but I thought I'd share.
Yea, I think that approach can be good in certain use cases. You can do similar things in Squiggle if you want, though it's not as emphasized.
Representing all distributions as generator functions that lazily return samples is a very tempting idea to me.
Benefits:
Risks/problems:
cdf
, pdf
, mean
and other statistics functions would be compatible with that approachOne more meta-level benefit is that if we do this, it'd be much clearer to me that Squiggle is necessary, and not just a trivial layer of syntax sugar on top of JS.
One less extreme way to try this, without changing the entire language and its semantics, is to add a 4th type of distributions: "GeneratorDist". I think it'd be mostly compatible with what we have right now: we could treat it as a lazy SampleSetDist at first, and pull sampleCount samples from it when it's first used. Then we could make it more lazy when it's possible.
Good point about imports. I think that generator functions are neat, but more complicated for users - so I was kind of hoping we wouldn't need them, but maybe we would for imports to scale.
I'm pretty nervous about doing another huge change now.
is to add a 4th type of distributions: "GeneratorDist". I think it'd be mostly compatible with what we have right now: we could treat it as a lazy SampleSetDist at first, and pull sampleCount samples from it when it's first used.
I would be interested/fine with doing something like this.
I think that generator functions are neat, but more complicated for users
Do you have examples where that might be the case?
My hope for this idea is that we could make a switch completely transparently from users, mostly. They don't need to know that their x = 2 to 5
doesn't generate a list of samples immediately.
I.
See the summary statistics for:
II.
See the samples for: