Closed jowl closed 7 years ago
Nice! Two thoughts:
job.getState()
). I'm a bit too tired right now to see immediately if that's the case but I can certainly try it myself if you agree about the usefulness.@iconara commented on the error handling in the callbacks in my previous PR, but I forgot about it. It's fxed now though.
The latter idea might be good, even if I don't know how much useful info you could extract from a Job
instance. The problem with implementing it is that there's no way of knowing which job failed from the ConfigurationDefinition
, as it only has a reference to the Context
, which in turn has a reference to a job stack, which has references to all the Job
instances. One would have to thread the failing job all the way back out to the ConfigurationDefinition
, either via #wait_for_completion
or a new accessor #current_job
or similar, to get it to work. It would probably be interesting to know whether the job was killed or failed from the state though.
Maybe a sepatate after
for each job, and one for the whole application? (I'm not sure I'm for this suggestion, but I'm throwing it out there)
Yes, I mentioned something like that in the description as well. Then it could make sense to also implement #before
for individual jobs (or maybe for job sequences).
Another idea would be to simply yield the job stack and let the user find out which job (or jobs if running in parallel) has failed.
Yes, I mentioned something like that in the description as well
Ah, now I saw.
A variant that means we can defer the decision on what to yield to later is to yield a result object with a #success?
method. If we want to add more things later they can be added to this result object without breaking backwards compatibility.
Now we're talking. I really like that idea.
All problems can be solved by adding another layer of indirection.
If we should have before/after hooks, I would like for them to behave similarly to RSpec, otherwise I think it will be confusing. Thus, you would have to specify #after(:all)
if you wanted a final callback, and #after
would implicitly be #after(:each)
. An optional argument representing the job (for :each
) and the whole execution (for :all
) seems like a reasonable thing.
I don't agree with making it work like RSpec. In RSpec you use before
and after
to make sure that you get the same setup for every test. I don't see that as the primary (or even as a common) use case for this. You most likely want to do different things before and after different jobs.
I guess my problem is that I can't really see what you would use the before and after hooks in the configuration block for. I guess it is a matter of personal preference, but I prefer to have as little code as possible in the configuration block. For performing task-specific setup, I much prefer the solution in Humboldt, where we have #setup
and #cleanup
hooks for individual mappers and reducers.
I guess hooks could be useful for whole jobs (or collection of jobs) as well, but then I'd prefer to name them #setup
and #cleanup
, rather than #before
and #after
to avoid confusion.
They wouldn't be setup blocks though, and cleanup only in the sense that you can do things like cleaning up files on HDFS.
The primary use case for this, which might have gotten lost since this moved from another PR, and since the discussion also has been done mostly offline, is to make it possible to, for example, send a message over SNS when the application is done. This only requires after
at the top level, and that's the only thing we should include from the start, IMO.
I don't really understand your point @grddev, you say
I can't really see what you would use the before and after hooks in the configuration block for
and then
I guess hooks could be useful for whole jobs
which I find a bit contradictory.
And I agree with @iconara about only including a top-level #after
in this release. Although I'm not hell-bent on it being called #after
. I don't dislike the name #on_completion
.
@jowl, I don't really see the problem with my reasoning. I didn't know of a concrete use case, but I guessed that there was one. The second sentence was just to say that my reservation visavi the name confusion still applied.
@iconara, assuming there is no built-in way to get an SNS message when the job completes, that certainly seems like a reasonable use case. While #cleanup
isn't the optimal name for the hook, it is the name used by Hadoop, and therefore the name used by Humboldt. But I seem to be in a minority here, so I'll rest my case. :)
Thanks for clarifying, I read something like "I don't think X is useful, but I guess it is", but now it makes sense.
And regarding your second point. Setup and cleanup are concepts within mappers and reducers, and I think calling it that might make it a bit confusing since it will be run in a completely different context.
I've now wrapped success
in a RunResult
, along with all the jobs so that @bjorne can extract whatever information he wants from them. This is an MVP for @bjorne's use case. I've also updated the PR description accordingly.
This was a nice idea, but given that we haven't merged it in two years I think it's better to close for now.
As promised, here's a PR suggesting to add
ConfigurationDefinition#after
which would enable users to register callbacks to be run when the all jobs has finished. The callbacks are called with aRunResult
instance with information about the run, e.g.Taking this further, you could implement similar callbacks for individual jobs and sequences thereof, but I thought I'd start simple.