hortonworks-spark / shc

The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.
Apache License 2.0
552 stars 280 forks source link

No support for Google BigTable #324

Closed vim89 closed 4 years ago

vim89 commented 5 years ago

Guys,

Here's what I did -

  1. Forked the package, made adjustments in HBaseRelation.scala so that while using this for Google BigTable the class must not call methods of Namespaces viz. getNamespaceDescriptor, createNamespaces etc. Check this
  2. Compiled my version of package, without big fat jar file, it's become hardly 600 KB
  3. Bundled this package in my code where I'm using SHC for DF write operations as HBase format
  4. If I run the job I see issues like - java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/TableDescriptor at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:61) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.TableDescriptor

If I have compiled SHC on my local, and if I want to use it in my spark job without obstacles like above, what should I do ? Where am I making mistake ?

vim89 commented 4 years ago

Apolgies for commenting late, I had found the solution already. But forgot to check-in the fix in my forked version of source.

For the NoClassDefFoundError, it was dependencies compatibility issue. During packaging, all the sources & dependencies must be compatible with each other, and compiled with identical versions of Scala, Java etc.

And actually, there was a bug which was causing this API not supporting if used with Google Big Table. Basically HBase has Namespaces and Google Big Table do not.

I forked the latest version and fixed this bug / added enhancement in this API to support Google BigTable as well.

I had to create an if else branch for that based on "tabletype" argument passed to HBaseTableCatalog class. "tabletype" variable takes value "bigtable" by default value will be "hbase" if not explicitly specified.

Forked & Bug fixed version link - https://github.com/vim89/shc

The README file is updated, and usage of HBaseTableCatalog for Google BigTable is illustrated in README.md

Bazger commented 4 years ago

Thank you a lot, it worked perfectly for me!