hortonworks-spark / shc

The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.
Apache License 2.0
552 stars 281 forks source link

The purpose of defining data type in catalog #256

Open sudododo opened 6 years ago

sudododo commented 6 years ago

I'm wondering the purpose of defining data type in catalog. Looks like if the data type in catalog doesn't match the data type of the corresponding dataframe's column, then the data type of the dataframe's column is applied when writing to HBase. For example, a dataframe has an integer column named 'col1', and the type of this column in catalog is String, then the integer will be put into HBase. Shc doesn't do data type conversion for you from my understanding. If so, what's the point of defining the data type in catalog?