hortonworks-spark / spark-llap

Apache License 2.0
101 stars 68 forks source link

HiveStreamingDataSource format writes string data in binary form in Hive Table whereas CLI writes string #270

Open sinhashreesh opened 5 years ago

sinhashreesh commented 5 years ago

I am trying to write from Kafka Source to Hive Target. I am using "com.hortonworks.spark.sql.hive.llap.streaming.HiveStreamingDataSource" format to write data into Hive table. Table is ORC format and fully transactional. I am using HDP 3.1 cluster.

The column data type is binary. When I write from CLI, the data can be seen in String. While doing the same using "com.hortonworks.spark.sql.hive.llap.streaming.HiveStreamingDataSource" format, the data is being inserted in binary form in the Hive Table.

+----------------------+ | hive_binary.co | +----------------------+ | ShreeshData1 | -->when inserted from CLI | [B@1ebd3260 | -->when writing from spark using "com.hortonworks.spark.sql.hive.llap.streaming.HiveStreamingDataSource" format +----------------------+

Is it the bug or is it the expected behavior?

Attaching Standalone spark program standAlone.txt

Create Table Command : create table hive_binary (co binary);

Table Properties : image