hortonworks-spark / shc

The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.
Apache License 2.0
552 stars 281 forks source link

Column qualifier for multiple columns in a column family. #340

Open piyushkhetan opened 3 years ago

piyushkhetan commented 3 years ago

I have a hbase table with multiple columns in a single column family let say "c". In order to save disk space this column family can be written to a single family qualifier and it will make only one row for a given rowkey. How can I achieve this using this library. I tried this catalog where I am trying to make "q_data" as my family qualifier.

{ "table":{"namespace":"default", "name":"spark_dense_string", "tableCoder":"PrimitiveType"}, "rowkey":"id:hist_timestamp", "columns":{ "id":{"cf":"rowkey", "col":"id", "type":"string","length":"36"}, "hist_timestamp":{"cf":"rowkey", "col":"hist_timestamp", "type":"string"}, "q_data":{"cf":"c","col":"value", "type":"double"}, "q_data":{"cf":"c", "col":"est_val", "type":"double"}, "q_data":{"cf":"c","col":"replaced", "type":"smallint"} } }

Thanks in advance!

jonashartwig commented 3 years ago

I think if you change to this (according to documentation) it should work as you like:

{
"table":{"namespace":"default", "name":"spark_dense_string", "tableCoder":"PrimitiveType"},
"rowkey":"hist_timestamp",
"columns":{
"id":{"cf":"rowkey", "col":"hist_timestamp", "type":"string","length":"36"},
"hist_timestamp":{"cf":"rowkey", "col":"hist_timestamp", "type":"string"},
"q_data":{"cf":"c","col":"value", "type":"double"},
"q_data":{"cf":"c", "col":"est_val", "type":"double"},
"q_data":{"cf":"c","col":"replaced", "type":"smallint"}
}
}