Closed jowl closed 9 years ago
I kind of like the idea, but I'm a bit torn about whether or not I want this in Rubydoop. I'd like to get rid of the job definition DSL completely, so adding to it might not be right. On the other hand, this feature would probably fit with whatever it's replaced by.
I think the gain over something like the following would be quite small. Given that the DSL should be reworked, I'm not sure the added functionality makes sense.
second_tmp_path = Dir::Tmpname.make_tmpname('second', nil)
job 'second-step' do
input first_path
output second_tmp_path
end
job 'third-step' do
input second_tmp_path
output output_path
end
I thought about the same thing as well, but I like my solution a bit better for two reasons; firstly, it hides away the "unique" path generation from the user, and secondly, it makes it a little bit harder to make mistakes, someone might for instance use another variable for the output path of the second step in your example without changing the definition of the third step.
Having that said, I don't know whether this should be merged or not, given that the DSL may be removed soon. But as @iconara said, this should fit with a non-DSL based interface as well.
What's the consensus of this? I don't mind including this in a v1.2 release. A v2.0 with a new DSL isn't exactly round the corner, so if this helps people I'm not against it.
Merged in 3d3c672ebb21638b1e0548ae429c7e589e5cf654. I also update the tests since #28 upgraded RSpec (see dd90f1a385bba7d5ea74278238e91aadc9b1a5e0).
I've managed to have Hadoop jobs fail due to using the same intermediate path for multiple steps on more than one occasion, and have thought a bit about how to avoid it. I'm not sure how much this particular solution actually helps with the problem, but I thought it might at least serve as a decent start for a discussion.
The idea is that you should be able to configure jobs with multiple steps like:
The argument handling in
JobDefinition#output
is rather hacky, as it can be used as both a getter and a setter, and with a semi-optionaldir
argument. But I figured that it is better to make it complex than to make using it complex. Then again, I wouldn't mind it not being so hacky.