nwf / dyna

Dyna2 compiler and REPL
GNU Affero General Public License v3.0
146 stars 20 forks source link

weird errors in forward-backward execution #47

Open jeisner opened 11 years ago

jeisner commented 11 years ago

The forward-backward ice cream spreadsheet example seems to be exposing bugs in the Dyna implementation. Can you guys try to make it work right? I wrote this code last night to give to the LI students -- I taught them from the spreadsheet yesterday. I figured I'd also give them some POS tagging data so they could play with the same program on a bigger example.

I'm getting a lot of division by 0 errors when running with a concatenation of icecream.dyna and forward-backward.dyna (checking those now into the examples/ directory).

Several odd things:

timvieira commented 11 years ago

I'm getting a lot of division by 0 errors when running with a concatenation of icecream.dyna and forward-backward.dyna (checking those now into the examples/ directory).

The way to fix this is by smoothing... Are you claiming that the fixed point of the program doesn't have zero-division?

I don't believe that these are actually errors. If I change the file to start with num_iterations := 0, and then gradually increase num_iterations one step at a time at the repl, I get no errors and I exactly match the HMM ice cream spreadsheet. But if I start num_iterations above 0, or increase it by more than 1 at a time, I get errors.

This suggests that execution order matters for your program, which indicates that it has an unfortunate cycle in there somewhere.

I'd think that $error values would feed through many iterations, but they don't seem to. IIRC, I only get values at all as far as the first iteration that has errors. And there seem to be some uncomputed things, e.g., count_emission_denom is defined as a sum over count_emission but sometimes a query says that it's 0 even though the summands are all positive.

Errors don't propagate. We've discussed this several times. I understand it's semantics we'd like to have eventually. [won't fix].

Some of the errors are reported unhelpfully, e.g., they tell me that there's a div by 0 error in a rule of the form "A/B for C<D" but they report the value of C (only).

I have encountered this as well. I agree seeing the rule and the item which "drove" the error is not enough to understand why the error happened. What we really want to see here is something like > trace, which shows the rule grounding and evaluations of sub-expressions. Of course, since triggered an error, trace won't find the hyperedge (of course absence of the edge related to the fact that errors don't propagate, mentioned above).

I will look into showing the rule grounding + subs-expressions. We won't be able to get the entire grounding (in practice and in theory), but we should present enough at least for the user to see the divide-by-zero.

Does sol include error messages for items that are only $error because they have an antecedent that is $error? Not sure, but it probably shouldn't as those can clutter things up and make it hard to see where the errors actually start.

No, item's with value $error were pop-time errors, most likely due to a bad aggregation, e.g. x :- 3. Other errors happen at push-time, such as: > a += b/0. > b := 1.

At one point I was getting errors that said * wasn't defined on Float * Term, but the Term in this case seemed to be nan, which I would think would be a kind of Float.

This was probably due to trying to multiply $error and a number. Divide by zero does not result in nan -- it's an exception.

Updating num_iterations or lambda doesn't seem to automatically print all the changes as I'd expect (or the errors). Sometimes it prints some of the changes, and sometimes none at all. I can still type sol to see what I missed.

What does it print and what did you expect it to?

jeisner commented 11 years ago

The way to fix this is by smoothing...

In fact the program already has add-lambda smoothing, and it is true that the problem is sometimes cured by raising lambda above 0. However, it should work fine for lambda=0 too, at least for this dataset. That is because the program is supposed to be acyclic and is supposed to exactly replicate the HMM spreadsheet that I teach from. Maybe I accidentally introduced a cycle, but that doesn't explain why when I raise num_iterations one step at a time, I do get answers over 10 iterations that precisely match the spreadsheet.

Are you claiming that the fixed point of the program doesn't have zero-division?

Right. The error is not the usual business of producing a 0/0 value that is never used (or is only multiplied by 0). The error is one of when dividing a positive numerator by a denominator given by the sum of all numerators. That division should be fine, but some reason the denominator is computed as 0 even though it has positive summands and no negative summands!

So I think this is a bug in Dyna somewhere. Here is the start of a trace from a somewhat cut-down example -- you can see why I am suspicious that the aggregation isn't happening right. I'd be happy to send you the files. I could also try randomly cutting things away in a way that preserves the demonstration of the bug.

> query count_emission_denom(0,Tag)

count_emission_denom(0,"C") = 0.
count_emission_denom(0,"H") = 0.
count_emission_denom(0,stop) = 0.

> trace count_emission_denom(0,"C")

count_emission_denom(0,"C") = 0
|
├─ += 0.12500000000000003
│  
│  count_emission_denom(I=0, Tag="C") += count_emission(I=0, Tag="C", Word=3)=0.12500000000000003.
│  |
│  └─ count_emission(0,"C",3) = 0.12500000000000003
│     |
│     ├─ += 0.12500000000000003
│     │  
nwf commented 11 years ago

OK, so I figured out what's going wrong.

I added a trivial example to the forward-backward program and set num_iterations to 1. (Jason, as an aside, it would be nice if this file had clearly labeled, perhaps in all caps or even ASCII art, "YOU SHOULD BE CHANGING THE FILE HERE" lines. It has them, but they sort of blend in to the other comments.)

diff examples/forward-backward.dyna tmp/test.dyna 
13a14,16
> word(1) := &worda.
> word(2) := &wordb.
> 
25c28
< num_iterations := 10.
---
> num_iterations := 1.
94a98,109
> 
> p_t(&start,&taga) := 0.5.
> p_t(&start,&tagb) := 0.5.
> p_t(&taga,&stop) := 0.1.
> p_t(&taga,&taga) := 0.45.
> p_t(&taga,&tagb) := 0.45.
> p_t(&tagb,&stop) := 0.1.
> p_t(&tagb,&taga) := 0.45.
> p_t(&tagb,&tagb) := 0.45.
> 
> p_e(&taga,&worda) := 1.
> p_e(&tagb,&wordb) := 1.

In any case, hacking up the runtime to show me what's happening on the agenda, I am told that the last actions taken are to pop some count_transition/3 items, all of which behave much like this last one here:

ITEM count_transition(0,tagb,stop) WAS 0.0 AGGR plus_equals({0.0: 1, 1.0: 1})
 DELETING...
  SUCCESSFUL DURING <function _ at 0x25e0e60>
  ENCOUNTERED ERROR ZeroDivisionError('float division by zero',) DURING <function _ at 0x25e1140>
 IS 1.0
 ADDING...
  SUCCESSFUL DURING <function _ at 0x25e0e60>
  ENCOUNTERED ERROR ZeroDivisionError('float division by zero',) DURING <function _ at 0x25e1140>

So: what's all that mean? Well, it means that the update handler that results from

count_transition_denom(I,PrevTag) += count_transition(I,PrevTag,Tag).

is completing successfully but that the one from

p_transition(I+1,PrevTag,Tag) 
    = count_transition(I,PrevTag,Tag) / count_transition_denom(I,PrevTag) 
          for I < num_iterations.

is erroring out. Why is it erroring out? Because the exceptions are self-reinforcing. Because the latter rule divided by zero, we rejected the changes even to the first rule, placing the count_transition/3 item into error state. Therefore, the latter value remains zero and everybody is sad forever.

Not quite sure what to do about it, but that's at least the answer.

@timvieira, how bad would it be to track which rules (or really, which propagators) raised exceptions and just exclude those, not all of them, from this and the next re-propagation? It would probably require another layer of buffering, in which the update_propagator loop did not have emit write direclty into changes but applied them rule at a time, as well as an additional list inside the error[item] value of which propagators not to call. I think that would solve this problem, but I am not yet sure that it does not open up others.

jeisner commented 11 years ago

Because the latter rule divided by zero, we rejected the changes even to the first rule, placing the count_transition/3 item into error state.

Why this kind of backward error propagation? count_transition/3 didn't do anything wrong.

Jason, as an aside, it would be nice if this file had clearly labeled, perhaps in all caps or even ASCII art, "YOU SHOULD BE CHANGING THE FILE HERE" lines. It has them, but they sort of blend in to the other comments.

Yes, sorry, this was not intended to be the final form for the examples/ directory or the students. I was trying to replicate the spreadsheet myself first, and that's just the level of commenting that I do as I go along. I would like to refactor it into multiple .dyna files (maybe in a subdir on weighted graphs). We want a general inference algorithm on graphs or hypergraphs (actually a choice of algorithms: sum-product + optional posterior decoding, and max-product + optional Viterbi decoding), separate from using that inference algorithm as the E-step of reestimation, and separate from a specification of an HMM trellis as the graph you want. Also, the parameterization (and M-step reestimation) of the weights on the HMM trellis could be involve transition and emission probabilities or could be globally normalized, and could use features or not ...

nwf commented 11 years ago

Why this kind of backward error propagation? count_transition/3 didn't do anything wrong.

As discussed at length before, it's all we have: we're standing at the source of a hyperedge and running code to figure out where they go, and the code just threw an exception rather than completing iteration. The runtime thus is flagging the fact that the last time a given count_transition/3 attempted to propagate, an error occurred. The value of count_transition/3 in the chart is correct, but the value of its children (which are unknown to the runtime) are not.

I had thought that this design was viable for a first pass, especially as it avoids our usual examples of error latchup. The tweak mentioned at the end of my prior comment is sufficient for the case at hand (though it probably remains possible to write a program which will experience error latchup due to cycles induced by coarsening to rules).

jeisner commented 11 years ago

Let me move the general error propagation discussion to a new issue ...