probcomp / crosscat

A domain-general, Bayesian method for analyzing high-dimensional data tables
http://probcomp.csail.mit.edu/crosscat/
Apache License 2.0
322 stars 42 forks source link

Fix dha_example.py. Most LocalEngine calls take a seed now (as of https://github.com/empiricalsys/crosscat/commit/005dd699ddfe38a5907871319b483870eeda20bf). Give it one. #98

Closed asilversempirical closed 8 years ago

asilversempirical commented 8 years ago

I get the impression from https://github.com/empiricalsys/crosscat/commit/0aad0baa7de04bf3eb3991c8b619f1ca4de1d6ff that re-using the seed is frowned upon and generating seeds from a meta-seed is preferred. But I don't totally understand in the below case, since they're different operations so presumably the identical values won't cause some degenerate outcome.

gregory-marton commented 8 years ago

Unsure why you closed it, but I would welcome a change like this.

asilversempirical commented 8 years ago

Oh hi! I'm the new guy at Empirical, and we're trying to flesh out our internal review process for things that are intended to land in probcomp. I'm doing a first-pass review in https://github.com/asilversempirical/crosscat/pull/1, but I'll re-open this afterwards.

gregory-marton commented 8 years ago

I'd love to have a step early in that process to give probcomp a heads-up that someone is working on the given thing. This was on my to-do list for "soon", and I'm glad I can just let you do it, but I'd love to know that it was assigned, when it was assigned. One way to do that might be to file a bug and self-assign. Other thoughts welcome!

riastradh-probcomp commented 8 years ago

It may look like a different operation, but for myself, at least, it would take an explicit proof about the operations in question to persuade me that they really are effectively independent. (For example, maybe the first thing they do is to draw CRP clusters -- in which case maybe they will always pick exactly the same CRP partitioning, which is definitely not what you had in mind.) It doesn't really matter for this bit-rotted example, but I like consistently following good habits -- and I suggest that just drawing independent seeds is much easier than furnishing such a proof!

rng = random.Random(inf_seed)
def get_next_seed(): return rng.randint(1, 2**31 - 1)
... engine.initialize(..., get_next_seed(), ...)
... engine.analyze(..., get_next_seed(), ...)

(The range [1, 2**31) is wacky, but will be convenient for a future search-and-destroy^Wreplace mission that drops in a high-quality PRNG with a 32-byte seed, since most of the rest of crosscat uses the same pattern.)

asilversempirical commented 8 years ago

@gregory-marton Noted. You're right. This definitely would have done well with a bug + followup. Will do so next time.

@riastradh-probcomp Sounds good, and now I see the similar changes made in a couple other places. I took your suggestion and have now re-opened this PR.

Thanks for the review, both!