tuplejump / cash

The CASsandra Hive handler
34 stars 25 forks source link

Questions on performance #7

Open sagpid opened 10 years ago

sagpid commented 10 years ago

Hi,

We download the code and were able to make it work both on a localhost deployment of cassandra and a remote deployment. Thanks a lot of the great piece of work that you have shared, and it has saved us a lot of time and effort.

Please find my questions below on performance.

  1. About 275 map jobs are started in hadoop when a simple select count(*) is issued on the hive. This slows down the query enormously if the query is issued on hive on a external table which is located on cassandra. ( about 30 minutes for 150 records)
  2. If I create hive table from external cassandra table it is very slow. ( About 30 minutes.

Is there a work around or something to be expected from hive side.

thanks

Sagar