hanborq / hadoop

A Hanborq optimized Hadoop Distribution, especially with high performance of MapReduce. It's the core part of HDH (Hanborq Distribution with Hadoop for Big Data Engineering).
Apache License 2.0
49 stars 53 forks source link

few questions #1

Closed achaubal closed 12 years ago

achaubal commented 12 years ago

hi,

I was curious about this project; a few questions,

  1. after the build, is the management and administration similar to standard apache hadoop? with job/task trackers, nn/dn?
  2. Are the tenzig/dremel style analytical functions - rank, lag, lead, ntile etc. available in hanborq?
  3. can cloudera enterprise manager/SCM be used on top of this?

thanks

ameet

schubertzhang commented 12 years ago

Thank you for your curiosity.

  1. Yes, it's all similar as apache hadoop and cloudera's cdh3. In fact, our release is based on cdh3u2. The sync to cdh3u3 will complete soon.
  2. The features of SQL in Tenzing is developed and developing in Hanborq, and we plan to release a open source implementation based on Hive in the future. But the exact time is not decided.
  3. I think the cloudera's SCM or new manager can work with HDH. But it is not tested yet in hanborq.

Thank you Ameet.

achaubal commented 12 years ago

so, at the moment, if I build and deploy hanborq and use Hive on top, what benefits do I get?

  1. the speed in comparison to m/r due to worker pools?
  2. any new functions currently not available in Hive?

thanks

ameet

schubertzhang commented 12 years ago

you can use hive on top, it works fine. I think you can get following benefits:

  1. The Worker Pool and fast task scheduling. The job launch/setup is very fast. apache and cdh will take 15~20s to setup a even small job.
  2. efficient shuffle.

Because apache hive does not support "sort avoidance", so many aggregation(use hash) and join (use hash) cannot benefit from HDH by now.

If we are ready to release our hive, it may be ok.

schubertzhang commented 12 years ago

We recommend writing mapreduce by yourself to get benefits from HDH. such as, hash aggregation, join, etc.

achaubal commented 12 years ago

thanks