Closed bjorne closed 10 years ago
Good to get coverage of this, but puts
as a signal that they have run is less than ideal… I looked at the JobContext
class and there's another method besides #write
that has measurable side-effects: #setStatus
. It looks like you're allowed to set the status to any string, so it could be used for these tests just to signal that the setup and/or cleanup has run.
It's still a bit of a hack, but #setup
and #cleanup
aren't meant to have side-effects, and it's hard to inject things that they can work on. Hadoop makes it really hard to do dependency injection.
I'll give it a try. Perhaps even better would be to use Reporter#incrCounter
, since these counters are summarized at the end of the run (according to my understanding.) I currently do not see how that would be done, though.
I tried using your suggested approach, @iconara. However, it seems only two of the statuses actually appear in the job output. So I suspect the status is not printed when it's set, but rather probed and printed at an interval, like the progress.
Counters are now used to signal whether setup/cleanup runs.
Now you haz commit bit.
An error in the order of the cleanup in
ReducerProxy
caused#cleanup
to never be called in reducers and combiners. This change adds test cases to the integration spec and fixes the issue.I welcome more elegant ways to signal that the setup and cleanup methods have been called than using
puts
. I was thinking maybectx.write
could be used with some special words, but then it will litter the actual job output instead.