iconara / rubydoop

Write Hadoop jobs in JRuby
220 stars 33 forks source link

How would I use that for records in HBASE? #5

Closed maxott closed 11 years ago

maxott commented 11 years ago

This most likely highlights my ignorance about Hadoop, but how would I go about using this for MR tasks over a range of HBASE records?

Thanks

iconara commented 11 years ago

I don't know either, unfortunately. If I remember correctly HBase has it's own map/reduce framework (some kind of specialization of the standard MR APIs), and if that's the case it would be tricky to use Rubydoop (but it could be done). If I'm wrong then it's just a matter of setting the right input format for the job, and that's simple.

maxott commented 11 years ago

Wow, that was fast. Thanks. If you have time, could you have a look at http://hbase.apache.org/book/mapreduce.example.html and see if this sounds like an easy or difficult contortion to get working?

Thanks

On 25/02/2013, at 11:35 PM, Theo Hultberg wrote:

I don't know either, unfortunately. If I remember correctly HBase has it's own map/reduce framework (some kind of specialization of the standard MR APIs), and if that's the case it would be tricky to use Rubydoop (but it could be done). If I'm wrong then it's just a matter of setting the right input format for the job, and that's simple.

— Reply to this email directly or view it on GitHub.

iconara commented 11 years ago

Sure, I'll have a look later today.

iconara commented 11 years ago

Looks like it would be pretty complicated, unfortunately. The mapper and reducer need to extend HBase's classes, which would require changes to Rubydoop's Hadoop integration code, and then you would have to replicate what it is that TableMapReduceUtil.initTableMapperJob is doing. It can be done, but it would be very, very tricky.

maxott commented 11 years ago

Thanks a lot for checking. Maybe I'll get adventurous and have a look at what you did and see if I can do something :)

Cheers, -max