iconara / rubydoop

Write Hadoop jobs in JRuby
220 stars 33 forks source link

RFC: Add support for custom input formats #22

Closed grddev closed 10 years ago

grddev commented 10 years ago

This adds an InputFormatProxy along the same lines as the other proxies.

Unfortunately, I couldn’t really figure out how to write a meaningful unit test, so I implemented a (rather contrived) custom input format in the integration tests to at least make sure the code runs.

I needed to make a few changes to the Rubydoop core in order to get this running.

iconara commented 10 years ago

FileSplit seems to have changed API in Hadoop 2.2 so the tests break.

Setting the job script earlier doesn't hurt, don't know why it didn't do that before. All of the jobs have the same script, so why not.

I like the change from #create_instance to #lookup_class.

grddev commented 10 years ago

I really don't understand why it fails for FileSplit in Hadoop 2.2. The only difference seems to be that https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/FileSplit.html (compared to r1.0.4) has a second constructor without any arguments.

grddev commented 10 years ago

Closing this pull request as obsolete (replaced by #24)