hortonworks-spark / shc

The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.
Apache License 2.0
552 stars 281 forks source link

Added enhancement in the API to support for Google BigTable #337

Open vim89 opened 4 years ago

vim89 commented 4 years ago

Background -

Basically Google BigTable doesn't have namespaces & name descriptors Check this Hence, during createRelation task we have to skip calling getter/setter methods of namespaces & name descriptors viz. getNamespaceDescriptor() and createNamespace() There were 2 issues -

  1. I had created an issue
  2. Another similar one was created back in 2017.

What changes were proposed in this pull request?

  1. Create new class variable tableType in HBaseTableCatalog tableType variable by default is initialized to value "hbase"
  2. Add getter and setter methods to overwrite tableType variable
  3. Create if else branch in createTableIfNotExist() in HBaseRelation class based on tableType variable set in catalog, so as to skip calling namespace getter methods if API is used to perform write into Google BigTable
  4. Illustrate the usage for writing into Google BigTable iin README.md

How was this patch tested?

  1. Unit test is added, HBaseTableCatalogSuite.scala
  2. Manual testing is performed thoroughly and I'm using this in one of my project & running since 7 months in production, therefore I think now this is stable & right time to create pull request to merge into master branch

Regards, Vitthal