iconara / rubydoop

Write Hadoop jobs in JRuby
220 stars 33 forks source link

Change the configuration DSL to be less DSLy #13

Open iconara opened 10 years ago

iconara commented 10 years ago

I think the configuration DSL (i.e. the Rubydoop.configure thing) was a mistake. Rubydoop is meant to be as little Ruby on top of the mapreduce APIs as possible. The job file should be a script that does something more like you'd do it in Java:

job = Rubydoop::Job.new(mapper: MyMapper, reducer: MyReducer)
job.combiner = MyCombiner
job.input = Rubydoop.arguments.first
job.run

(the job could be configured either by options to the constructor or via setters, either way would work)

The difficult part would be how to make things work on the slave nodes. Currently we can run the job file on both the master and the slaves to get all of the code loaded right, and just make Rubydoop.configure be a no-op on the slaves. If the job file was a straight-up script that might not work that well.

Perhaps a middle ground between the current DSL and the above:

Rubydoop.main do |args|
  job = Job.new(...)
  job.run
end

In other words: keep the configure block, but rename it to main and drop the DSL.

The benefit of not having the DSL would be that things like running jobs in parallel, or building your own dependency logic would be much easier.

iconara commented 10 years ago

Perhaps even define a top-level method called Rubydoop:

Rubydoop do
  job = Job.new(...)
  job.run
end
deivinsontejeda commented 10 years ago

Makes sense.