hakaru-dev / hakaru

A probabilistic programming language
BSD 3-Clause "New" or "Revised" License
309 stars 30 forks source link

summary error #82

Closed cscherrer closed 7 years ago

cscherrer commented 7 years ago

I've confirmed that there's an error in summary. Recall the empirical result that P(0)=0.789. But summary produces code that results in this:

chad@komputilo:~/git/iu/tcp$ stack exec tcp2 | head -n 10000 | cut -f 1 -d "," | sort | uniq -c
   5560 0
   4440 1

Here's the compile output for comparison, which matches (within random noise) the empirical result:

chad@komputilo:~/git/iu/tcp$ stack exec tcp2 | head -n 10000 | cut -f 1 -d "," | sort | uniq -c
   7843 0
   2157 1

I'll confirm that everything is pushed so the result is easily reproducible.

cscherrer commented 7 years ago

...aaaaand it's gone.

There's some development strangeness, because...

  1. Hakaru is installed globally. If it weren't it would require update-maple even more often, and would be more of a nuisance. BUT
  2. It's also installed as part of the tcp repo. This is because the Hakaru repo includes not only the code for the executables, but also the library code to make the results of compile and summary actually work.

For reproducibility I had pinned the hakaru submodule to a particular commit. But as it turns out, this had some bugs that seem now to have been corrected.

ccshan commented 7 years ago

I'm sad that a bug disappeared without us knowing what caused it and what fixed it. We care about this bug because it was making our simplify-summary pipeline generate a Gibbs sampler that was incorrect and performed poorly. But now that this bug has disappeared, does our simplify-summary pipeline still generate a Gibbs sampler that is incorrect and performs poorly, compared to the analytical result and/or to Julia's TopicModels.jl?

cscherrer commented 7 years ago

Hmm, I think I spoke too soon:

chad@komputilo:~/git/iu/tcp$ compile --logfloat-prelude src/LDA/lda_simp.hk http://lda_simp.hk -o src/LDA/Model.hs -M LDA.Model

chad@komputilo:~/git/iu/tcp$ stack build tcp-0.1.0.0: unregistering (local file changes: app/TCP2.hs) tcp-0.1.0.0: build (lib + exe) tcp-0.1.0.0: copy/register Log files have been written to: /home/chad/git/iu/tcp/.stack-work/logs/ chad@komputilo:~/git/iu/tcp$ time stack exec tcp2 | head -n 10000 | cut -f 1 -d "," | sort | uniq -c

real 0m3.891s user 0m3.880s sys 0m0.236s

chad@komputilo:~/git/iu/tcp$ summary --logfloat-prelude src/LDA/lda_simp.hk http://lda_simp.hk -o src/LDA/Model.hs -M LDA.Model

chad@komputilo:~/git/iu/tcp$ stack build tcp-0.1.0.0: unregistering (local file changes: src/LDA/Model.hs) tcp-0.1.0.0: build (lib + exe) tcp-0.1.0.0: copy/register Log files have been written to: /home/chad/git/iu/tcp/.stack-work/logs/

chad@komputilo:~/git/iu/tcp$ time stack exec tcp2 | head -n 10000 | cut -f 1 -d "," | sort | uniq -c

real 0m2.739s user 0m2.700s sys 0m0.268s

On Mon, Jun 12, 2017 at 8:36 PM Chung-chieh Shan notifications@github.com wrote:

I'm sad that a bug disappeared without us knowing what caused it and what fixed it. We care about this bug because it was making our simplify- summary pipeline generate a Gibbs sampler that was incorrect and performed poorly. But now that this bug has disappeared, does our simplify -summary pipeline still generate a Gibbs sampler that is incorrect and performs poorly, compared to the analytical result and/or to Julia's TopicModels.jl?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/hakaru-dev/hakaru/issues/82#issuecomment-307985860, or mute the thread https://github.com/notifications/unsubscribe-auth/ABISwYBgIEWzQJ8rMQytfffIpHI44sKVks5sDe9lgaJpZM4N3gCT .

cscherrer commented 7 years ago

I suggest using compile for now and getting the result to Max today. Then I can start on the next model and help with debugging summary as it comes up. Or I could try to dig into summary, but I'm not familiar with the codebase so there would be spin-up time.

On Tue, Jun 13, 2017 at 9:43 AM Chad Scherrer chad.scherrer@gmail.com wrote:

Hmm, I think I spoke too soon:

chad@komputilo:~/git/iu/tcp$ compile --logfloat-prelude src/LDA/lda_simp.hk http://lda_simp.hk -o src/LDA/Model.hs -M LDA.Model

chad@komputilo:~/git/iu/tcp$ stack build tcp-0.1.0.0: unregistering (local file changes: app/TCP2.hs) tcp-0.1.0.0: build (lib + exe) tcp-0.1.0.0: copy/register Log files have been written to: /home/chad/git/iu/tcp/.stack-work/logs/ chad@komputilo:~/git/iu/tcp$ time stack exec tcp2 | head -n 10000 | cut -f 1 -d "," | sort | uniq -c

  • 7843 0*
  • 2157 1*

real 0m3.891s user 0m3.880s sys 0m0.236s

chad@komputilo:~/git/iu/tcp$ summary --logfloat-prelude src/LDA/lda_simp.hk http://lda_simp.hk -o src/LDA/Model.hs -M LDA.Model

chad@komputilo:~/git/iu/tcp$ stack build tcp-0.1.0.0: unregistering (local file changes: src/LDA/Model.hs) tcp-0.1.0.0: build (lib + exe) tcp-0.1.0.0: copy/register Log files have been written to: /home/chad/git/iu/tcp/.stack-work/logs/

chad@komputilo:~/git/iu/tcp$ time stack exec tcp2 | head -n 10000 | cut -f 1 -d "," | sort | uniq -c

  • 5560 0*
  • 4440 1*

real 0m2.739s user 0m2.700s sys 0m0.268s

On Mon, Jun 12, 2017 at 8:36 PM Chung-chieh Shan notifications@github.com wrote:

I'm sad that a bug disappeared without us knowing what caused it and what fixed it. We care about this bug because it was making our simplify- summary pipeline generate a Gibbs sampler that was incorrect and performed poorly. But now that this bug has disappeared, does our simplify-summary pipeline still generate a Gibbs sampler that is incorrect and performs poorly, compared to the analytical result and/or to Julia's TopicModels.jl?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/hakaru-dev/hakaru/issues/82#issuecomment-307985860, or mute the thread https://github.com/notifications/unsubscribe-auth/ABISwYBgIEWzQJ8rMQytfffIpHI44sKVks5sDe9lgaJpZM4N3gCT .

ccshan commented 7 years ago

I'll debug summary. But it's news to me that, as soon as you switch from compile to summary, Hakaru generates a working Gibbs sampler for the LDA model. How do you show that fact off?

cscherrer commented 7 years ago

On Tue, Jun 13, 2017 at 10:04 AM Chung-chieh Shan notifications@github.com wrote:

I'll debug summary. But it's news to me that, as soon as you switch from compile to summary, Hakaru generates a working Gibbs sampler for the LDA model. How do you show that fact off?

I'm sorry, I don't understand this question.

ccshan commented 7 years ago

Sorry, let me rephrase. Would you please demo (maybe even at our team meeting today?) what happens when you replace summary by compile to work around this bug?

ccshan commented 7 years ago

And would you please make sure that the versions of lda.hk and lda_simp.hk that demonstrate the bug are pushed to the tcp repo under the master branch and the src/LDA directory? Currently the latest lda.hk there contains a Hakaru syntax error.

cscherrer commented 7 years ago

Ok, I'm running now. Whould be interesting, maybe just to see the top topics it generates?

I'll roll back lda.hk to match the lda_imp.hk I'm using.

On Tue, Jun 13, 2017 at 10:38 AM Chung-chieh Shan notifications@github.com wrote:

And would you please make sure that the versions of lda.hk and lda_simp.hk that demonstrate the bug are pushed to the tcp repo under the master branch and the src/LDA directory? Currently the latest lda.hk there contains a Hakaru syntax error.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/hakaru-dev/hakaru/issues/82#issuecomment-308192529, or mute the thread https://github.com/notifications/unsubscribe-auth/ABISwRrZtxXbZ0kfdLhwlUNIQILLRW92ks5sDsj2gaJpZM4N3gCT .

cscherrer commented 7 years ago

I just pushed lda_simp2.hk, which is different only because of updates to the simplify command.

On Tue, Jun 13, 2017 at 10:41 AM Chad Scherrer chad.scherrer@gmail.com wrote:

Ok, I'm running now. Whould be interesting, maybe just to see the top topics it generates?

I'll roll back lda.hk to match the lda_imp.hk I'm using.

On Tue, Jun 13, 2017 at 10:38 AM Chung-chieh Shan < notifications@github.com> wrote:

And would you please make sure that the versions of lda.hk and lda_simp.hk that demonstrate the bug are pushed to the tcp repo under the master branch and the src/LDA directory? Currently the latest lda.hk there contains a Hakaru syntax error.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/hakaru-dev/hakaru/issues/82#issuecomment-308192529, or mute the thread https://github.com/notifications/unsubscribe-auth/ABISwRrZtxXbZ0kfdLhwlUNIQILLRW92ks5sDsj2gaJpZM4N3gCT .

ccshan commented 7 years ago

Thanks, and it was with lda_simp.hk that you experienced this summary bug, right?

cscherrer commented 7 years ago

yes, that's right

On Tue, Jun 13, 2017 at 11:35 AM Chung-chieh Shan notifications@github.com wrote:

Thanks, and it was with lda_simp.hk that you experienced this summary bug, right?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/hakaru-dev/hakaru/issues/82#issuecomment-308208485, or mute the thread https://github.com/notifications/unsubscribe-auth/ABISwVbB0pc3gII9kS_YAnIb45c-srRvks5sDtZggaJpZM4N3gCT .

ccshan commented 7 years ago

Ok, I think I fixed the problem. Would you please try summary again, @cscherrer, on lda_simp.hk (which unfortunately the latest master commit does not produce, due to https://github.com/hakaru-dev/hakaru/issues/84)?

In more detail: the problem was that

  1. summary produced subexpressions of the form "let summary = ... in 0", because
  2. summary calls NewSLO:-simplify_factor_assuming, and
  3. NewSLO:-simplify_factor_assuming(piecewise(And(d::integer,...), ..., 0), KB(Bound(d,=,idx(w,wordUpdate)), Introduce(d, HInt(Bound(>=,0), Bound(<=,Hakaru:-size(word_prior)-1))), Introduce(w, HArray(HInt(Bound(>=,0)))))) returned 0, because
  4. KB:-assume didn't realize that d::integer is always true, and
  5. KB:-assume thought that Not(d::integer) is always true.

Item 4 above is fixed by ba2e06449cedafadb89f957a54ee7bcf3fb6e9c1, and item 5 above is fixed by d1d85b7bc4501f1528e4cc8c085f795140011eaf.