timveil / hive-jdbc-driver

An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC
Apache License 2.0
40 stars 9 forks source link
apache hadoop hive java jdbc thrift

Hive JDBC Driver

This project is alternative to the JDBC driver that is bundled with the Apache Hive project. The desire to build this grew out of my experience maintaining the Hive JDBC "uber jar" project (here) which attempted to produce a smaller, more complete standalone driver jar by crafting an alternative Maven pom file. While that effort mostly succeed in creating a slightly smaller jar, I felt like more could be done to improve the Hive JDBC experience.

As I started building out this project I realized that I wanted to deviate significantly from the existing Apache implementation. As a result, this project does not desire or attempt to be URL or even feature compatible with the existing Apache Driver. One obvious manifestation of this is that existing JDBC connection strings/URLs that work with the Apache Driver WILL NOT WORK with this driver without modification. I've provided a mapping for existing URL properties here as well as plenty of examples.

Another significant deviation from the Apache implementation is the absence of Hadoop or Hive dependencies and their transitive dependency graphs. The only bridge to Hive in this driver is the Thrift Interface Description Language (IDL) file and the Java bindings it generates. All necessary code was rewritten from the ground up with an emphasis on eliminating external dependencies. This has the clear benefit of significantly reducing jar sizes and reducing opportunities for class conflicts! See size comparison below:

the standalone jar for Hive 1.2.x does not contain all necessary dependencies so this is not an accurate representation of the real size

Areas of Focus

The following are board areas where I have attempted expand or improve the existing Hive Driver:

Current State

This project is pre-alpha and should be considered experimental a this point. Currently it is built against Hortonworks Repos, but will soon be switched to more closely follow the Apache released versions.