iconara / rubydoop

Write Hadoop jobs in JRuby
220 stars 33 forks source link

Fix reducer/combiner cleanup #20

Closed bjorne closed 10 years ago

bjorne commented 10 years ago

An error in the order of the cleanup in ReducerProxy caused #cleanup to never be called in reducers and combiners. This change adds test cases to the integration spec and fixes the issue.

I welcome more elegant ways to signal that the setup and cleanup methods have been called than using puts. I was thinking maybe ctx.write could be used with some special words, but then it will litter the actual job output instead.

iconara commented 10 years ago

Good to get coverage of this, but puts as a signal that they have run is less than ideal… I looked at the JobContext class and there's another method besides #write that has measurable side-effects: #setStatus. It looks like you're allowed to set the status to any string, so it could be used for these tests just to signal that the setup and/or cleanup has run.

iconara commented 10 years ago

It's still a bit of a hack, but #setup and #cleanup aren't meant to have side-effects, and it's hard to inject things that they can work on. Hadoop makes it really hard to do dependency injection.

bjorne commented 10 years ago

I'll give it a try. Perhaps even better would be to use Reporter#incrCounter, since these counters are summarized at the end of the run (according to my understanding.) I currently do not see how that would be done, though.

bjorne commented 10 years ago

I tried using your suggested approach, @iconara. However, it seems only two of the statuses actually appear in the job output. So I suspect the status is not printed when it's set, but rather probed and printed at an interval, like the progress.

bjorne commented 10 years ago

Counters are now used to signal whether setup/cleanup runs.

iconara commented 10 years ago

Now you haz commit bit.