qubole / spark-acid

ACID Data Source for Apache Spark based on Hive ACID
Apache License 2.0
97 stars 34 forks source link

Difference between qubole/spark-acid and Hive-warehouse-connector #14

Closed davidmanukian closed 5 years ago

davidmanukian commented 5 years ago

Hello everyone! Thanks for the great library. Now I'm working on research of achieving acid on hive tables using spark. Everything what I found are qubole/spark-acid (your library) and hive-warehouse-connector (https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html). So I'm confused what to use and how to decide what I need.

Could you please describe difference what your library can or cannot in comparing to hive-warehouse-connector? What's the difference at all? Thanks!

somani commented 5 years ago

The major difference is that this library is a lightweight way to read and write(coming soon) Hive acid tables, whereas the Hive WareHouse Connector is quite heavy with its prerequisites.. As it describes in the page you linked, to use the Hive WareHouse Connector: "You must use low-latency analytical processing (LLAP) in HiveServer Interactive to read ACID, or other Hive-managed tables, from Spark." This means there needs to be a Hive LLAP cluster which the connector will connect to, to be able to read and write ACID tables.

This library on the other hand has no such requirements and can be used natively in Spark itself to read/write Hive ACID tables.

davidmanukian commented 5 years ago

@somani Great, thanks!