Open jyothirmai2309 opened 6 years ago
Do we have any option to convert integer values to integer only while writing dataframe to hbase through pyspark ,by default while writing dataframe to hbase integer values are converting to byte type in hbase table?
Below is the code: catalog2 = { "table": {"namespace": "default","name": "trip_test1"}, "rowkey": "key1", "columns": { "serial_no": {"cf": "rowkey","col": "key1","type": "string"}, "payment_type": {"cf": "sales","col": "payment_type","type": "string"}, "fare_amount": {"cf": "sales","col": "fare_amount","type": "string"}, "surcharge": {"cf": "sales","col": "surcharge","type": "string"}, "mta_tax": {"cf": "sales","col": "mta_tax","type": "string"}, "tip_amount": {"cf": "sales","col": "tip_amount","type": "string"}, "tolls_amount": {"cf": "sales","col": "tolls_amount","type": "string"}, "total_amount": {"cf": "sales","col": "total_amount","type": "string"} } }
import json cat2=json.dumps(catalog2)
df.write.option("catalog",cat2).option("newtable","5").format("org.apache.spark.sql.execution.datasources.hbase").save()
output: \x00\x00\x03\xE7 column=sales:payment_type, timestamp=1529495930994, value=CSH \x00\x00\x03\xE7 column=sales:surcharge, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x03\xE7 column=sales:tip_amount, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x03\xE7 column=sales:tolls_amount, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x03\xE7 column=sales:total_amount, timestamp=1529495930994, value=@!\x00\x00\x00\x00\x00\x00 \x00\x00\x03\xE8 column=sales:fare_amount, timestamp=1529495930994, value=@\x18\x00\x00\x00\x00\x00\x00 \x00\x00\x03\xE8 column=sales:mta_tax, timestamp=1529495930994, value=?\xE0\x00\x00\x00\x00\x00\x00
expected output: 999 column=sales:fare_amount, timestamp=1529392479358, value=8.0 999 column=sales:mta_tax, timestamp=1529392479358, value=0.5 999 column=sales:payment_type, timestamp=1529392479358, value=CSH 999 column=sales:surcharge, timestamp=1529392479358, value=0.0 999 column=sales:tip_amount, timestamp=1529392479358, value=0.0 999 column=sales:tolls_amount, timestamp=1529392479358, value=0.0 999 column=sales:total_amount, timestamp=1529392479358, value=8.5
I think you can use a lambda function to change the type of your data to an integer before writing them to hbase.
while writing from pyspark to hbase ,data is storing as hexa format what should i do to store data as integer in pyspark write command