Open cscherrer opened 7 years ago
There is definitely code that should recognize this and simplify it. I know @ccshan wrote some of it. And @yuriy0 's recent work might have tweaked it -- some of the things that needed such simplifications the most never made it into the test suite...
I'm not sure what code should be dealing with this. It might help to have a program where such expressions occur and are successfully simplified, as well as the one in which they are not.
do_peel
in Loop.mpl. So it is possibly not called in improve
, which would be why it is not triggered all the times that it should.
I believe there are several programs in fun-fact.mw
which do trigger this.
@cscherrer can you point us to the program you were using where this didn't happen?
@JacquesCarette
Here's the source, which simplify
s to this. The latter includes
(product d from 0 to numDocs:
(match ((d == doc[wordUpdate]) &&
not((doc[wordUpdate] < 0)) &&
not(((nat2int(numDocs) - 1) < nat2int(doc[wordUpdate])))):
true:
(nat2prob((summate d両 from 0 to size(w):
(match (d両 == wordUpdate):
true: 0
false:
(match ((d == doc[d両]) && (zNew丑 == z[d両])):
true: 1
false: 0))))
+
topic_prior[zNew丑])
false: 1))
Thanks. I think the issue is that the code to simplify these kinds of products is called only very late in the pipeline, and not in all the places that could benefit from it.
After pasting this, I realize I had hand-simplified some expressions that were obviously always true. But maybe this goes back to the difficulty of making that obvious to Maple.
If it were an order of simplification thing, it should reduce further with another call, right?
There are some simplifications that are not done because they require knowledge that is (currently) unavailable (such as properties of values inside arrays). If these are crucial, then we'll need to up the priority on that work.
Simplification, in theory, should be idempotent. In practice, I definitely don't think it is. Which is another way of saying: yes, try multiple calls to simplify. It might work.
I agree that all 3 occurrences of product
in https://github.com/hakaru-dev/tcp/blob/11a5498f9fd2f05838d9d39470e7a242819fb4d9/src/LDA/lda_simp.hk under categorical
should simplify
away, because if d == ...
is false then the body of the product
is 1
. But the code that's supposed to do this is not do_peel
in Loop.mpl
but rather eval_piecewise
in NewSLO/Piecewise.mpl
(line 76).
I tried to trace
this function to see if it's invoked on the 3 relevant piecewise
-under-product
occurrences (which it ought to be, because these product
s are factors and so it should be passed to simplify_factor_assuming
). But I couldn't get very far because I get this Maple error right away when I try to simplify
the same lda_simp.hk
(which by the way is older than the latest version of lda.hk
in the tcp
repository):
simplify: MapleException:
Error, (in assuming) when calling 't1'. Received: 'attempting to assign to `in`
which is protected. Try declaring `local in`; see ?protect for details.'
after sending to Maple:
use Hakaru, NewSLO in timelimit(300, RoundTrip(eval(eval(lda, lda=(lam(topic_prior, HArray(HReal(Bound(`>=`,0))), lam(word_prior, HArray(HReal(Bound(`>=`,0))), lam(numDocs, HInt(Bound(`>=`,0)), lam(w, HArray(HInt(Bound(`>=`,0))), lam(doc, HArray(HInt(Bound(`>=`,0))), lam(z, HArray(HInt(Bound(`>=`,0))), lam(wordUpdate, HInt(Bound(`>=`,0)), eval(eval(case(wordUpdate < numWords, Branches(Branch(PDatum(true, PInl(PDone)), Bind(Plate(numDocs, d, app(dirichlet, topic_prior)), theta, Bind(Plate(numTopics, k, app(dirichlet, word_prior)), phi, Bind(Categorical(ary(numTopics, i, 1)), zNew, Bind(Plate(numWords, n, eval(Bind(Bind(Msum(Weight((idx(idx(theta, idx(doc, n)), zz) * 1/(Sum(idx(idx(theta, idx(doc, n)), x0), x0=0..(size(idx(theta, idx(doc, n))))-1))), Ret(Datum(unit, Inl(Done))))), x27, Ret(zz)), bar, Bind(Msum(Weight((idx(idx(phi, zz), idx(w, n)) * 1/(Sum(idx(idx(phi, zz), x0), x0=0..(size(idx(phi, zz)))-1))), Ret(Datum(unit, Inl(Done))))), x27, Ret(idx(w, n)))), zz=(case((n = wordUpdate), Branches(Branch(PDatum(true, PInl(PDone)), zNew), Branch(PDatum(false, PInr(PInl(PDone))), idx(z, n))))))), foo, Ret(zNew)))))), Branch(PDatum(false, PInr(PInl(PDone))), Msum()))), numTopics=(size(topic_prior))), numWords=(size(w)))))))))))), dirichlet=(lam(as, HArray(HReal(Bound(`>=`,0))), Bind(Plate((size(as) + (-(1))), i, BetaD(Sum(idx(as, j), j=(i + 1)..(size(as))-1), idx(as, i))), xs, Ret(ary(size(as), i, eval((x * case(((i + 1) = size(as)), Branches(Branch(PDatum(true, PInl(PDone)), 1), Branch(PDatum(false, PInr(PInl(PDone))), (1 + (-(idx(xs, i)))))))), x=(Product(idx(xs, j), j=0..(i)-1))))))))), HFunction(HArray(HReal(Bound(`>=`,0))), HFunction(HArray(HReal(Bound(`>=`,0))), HFunction(HInt(Bound(`>=`,0)), HFunction(HArray(HInt(Bound(`>=`,0))), HFunction(HArray(HInt(Bound(`>=`,0))), HFunction(HArray(HInt(Bound(`>=`,0))), HFunction(HInt(Bound(`>=`,0)), HMeasure(HInt(Bound(`>=`,0)))))))))), _command=Simplify)) end use;
Which hakaru
repository commit (past or future) should I use to replicate the problem?
I forgot to say: In order to fire, the rewrite in eval_piecewise
(line 76 onward in NewSLO/Piecewise.mpl
) needs to know that the value that the loop variable d
is compared to is an integer
and is within the loop bounds. In the 3 cases in lda_simp.hk
, the values that d
is compared to are idx
expressions, so they are probably not known to be an integer
and not known to be within the loop bounds. This is why @JacquesCarette wrote above about "properties of values inside arrays". But a faster fix (if this lack of knowledge is indeed what's preventing the rewrite from firing) would be to residualize those conditions, i.e., to generate a piecewise(...,...,1)
if there are conditions that the kb
does not discharge. That would still get rid of the product
at least and provide most of the performance gain we want.
Thanks @ccshan that is valuable information. Hopefully @yuriy0 can make some progress from here.
Also, in
does not occur in that program, so that must be something new in the maple code. in
is indeed a very bad variable name to use (along with every other Maple-protected name).
@JacquesCarette, I'm not sure why you're talking about the variable name in
, because I don't see it used or generated anywhere...?
@ccshan Because of the error message you posted about 3 hours ago. https://github.com/hakaru-dev/hakaru/issues/80#issuecomment-307439065.
Using commit 131e0cf370272e7f85fc7dbd1a678c1f3566a6dd, I was able to track down the necessary preconditions to add to lda.hk
to coax the trivial products into reducing (https://github.com/hakaru-dev/tcp/commit/f09704390995c0986f812c904d38b45a644b1704). So this issue is less urgent now.
The in
issue is fixed. The trivial product issue is fixed too. Is there anything remaining unfixed in this issue?
Well, the trivial product issue is not completely fixed, merely worked around by adding if w[wordUpdate] < size(word_prior) && doc[wordUpdate] < numDocs
to the input lda.hk
. Those added conditions are reasonable, but in case they are absent from the input (as in the else
case in 131e0cf370272e7f85fc7dbd1a678c1f3566a6dd), the trivial-product rewrite should still happen, by residualizing those conditions, i.e., by generating a piecewise(...,...,1)
if there are conditions that the kb
does not discharge
@ccshan can you rig up a much simpler example to show this, and create a new issue with it? lda.hk
is very hard to work with as the "base case".
by residualizing those conditions, i.e., by generating a piecewise(...,...,1) if there are conditions that the kb does not discharge
I don't understand what it means for the KB to discharge a condition.
From what I do understand, we want to simplify product(piecewise(i=K,f(i),1),i=A..B)
to f(K)
but we can't do that for piecewise(C and i=K, .., ..)
. I believe something should be done with that extra condition C
to allow the simplification to proceed?
I don't understand what it means for the KB to discharge a condition.
Please understand "discharge" as "entail".
From what I do understand, we want to simplify
product(piecewise(i=K,f(i),1),i=A..B)
tof(K)
but we can't do that forpiecewise(C and i=K, .., ..)
. I believe something should be done with that extra conditionC
to allow the simplification to proceed?
Yes we want to simplify product(piecewise(i=K,f(i),1),i=A..B)
to f(K)
, but the problem is what if K
might not be an integer between A
and B
? For example, what if we know K
is an integer and K>=A
but the kb
doesn't tell us whether K<=B
? In that situation, I propose that we still simplify product(piecewise(i=K,f(i),1),i=A..B)
, but not to f(K)
but rather to piecewise(K<=B,f(K),1)
.
In that situation, I propose that we still simplify product(piecewise(i=K,f(i),1),i=A..B), but not to f(K) but rather to piecewise(K<=B,f(K),1).
This seems reasonable, and I am willing to try to implement it. Thanks for the clarification.
Just to confirm, the presence of the 'extra' condition C
in product(piecewise(C and i=K, .., ..),i=..)
should not prevent us from applying this simplification? The warning inserted in 131e0cf3 fire for several conditions like And(d = Hakaru:-idx(doc, wordUpdate), Hakaru:-idx(doc, wordUpdate) <= numDocs-1)
- in these cases, the loop bounds are like d = 0 .. numDocs-1
so the 2nd conjunct is true when the first is true, but I don't see where that is checked (that the other conjuncts can be safely disregarded).
Indeed the presence of C
should not prevent the simplification. In the worst case we should get piecewise(And(K<=B, C), f(K), 1)
, but if And(K::integer, A<=K, K<=B)
happens to entail eval(C, i=K)
, of course it would be nice to take advantage of that and get piecewise(K<=B, f(K), 1)
instead. The recursive call to eval_factor(eval(res, x=b), ...)
is supposed to take care of such things...
simplify
sometimes produces expressions of this form:But since the match is an equality test (not a more general predicate) and the false branch is 1, this loop reduces to
f(k)
.Can
simplify
be made to recognize this? Or maybe this is acompile
orsummary
thing. (The boundaries or responsibilities of the tools are a little blurry to me)