I think the configuration DSL (i.e. the Rubydoop.configure thing) was a mistake. Rubydoop is meant to be as little Ruby on top of the mapreduce APIs as possible. The job file should be a script that does something more like you'd do it in Java:
(the job could be configured either by options to the constructor or via setters, either way would work)
The difficult part would be how to make things work on the slave nodes. Currently we can run the job file on both the master and the slaves to get all of the code loaded right, and just make Rubydoop.configure be a no-op on the slaves. If the job file was a straight-up script that might not work that well.
Perhaps a middle ground between the current DSL and the above:
Rubydoop.main do |args|
job = Job.new(...)
job.run
end
In other words: keep the configure block, but rename it to main and drop the DSL.
The benefit of not having the DSL would be that things like running jobs in parallel, or building your own dependency logic would be much easier.
I think the configuration DSL (i.e. the
Rubydoop.configure
thing) was a mistake. Rubydoop is meant to be as little Ruby on top of the mapreduce APIs as possible. The job file should be a script that does something more like you'd do it in Java:(the job could be configured either by options to the constructor or via setters, either way would work)
The difficult part would be how to make things work on the slave nodes. Currently we can run the job file on both the master and the slaves to get all of the code loaded right, and just make
Rubydoop.configure
be a no-op on the slaves. If the job file was a straight-up script that might not work that well.Perhaps a middle ground between the current DSL and the above:
In other words: keep the configure block, but rename it to
main
and drop the DSL.The benefit of not having the DSL would be that things like running jobs in parallel, or building your own dependency logic would be much easier.