rcongiu / Hive-JSON-Serde

Read - Write JSON SerDe for Apache Hive.
Other
732 stars 391 forks source link

attribute name when using "insert directory ... row format serde" #151

Open tkang007 opened 8 years ago

tkang007 commented 8 years ago

Hello, Thanks so much to the contributes of this project and any feedback for my question.

On the below use case, the top attribute names are _colN instead of column name. Same result with explicit select listing(same for select * from/select a,b,c from) Is there any options to use selected column name for the top level attribute name at the result file.

test.hql: add jar /home/myuser/lib/Hive-JSON-Serde/json-serde/target/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar;

insert overwrite directory /user/myuser/outuput
row format serde 'org.openx.data.jsonserde.JsonSerDe' select * from mytable;

-- mytable is an external table using same json serde.

result file: {"_col4":"ccc","_col3":"ddd","_col2":"eee","_col1":"ffff"," _col0":{"business":{"state":{"vehicle":[ {"addeddate":"12/1/2011","vin":"xxx","deleteddate":"12/1/2015"}, {"addeddate":"12/1/2011","vin":"yyy","deleteddate":"12/1/2015"}]}},"principal":{"businessinfo":{"cd":"123"},"partyinfo":{"address":{"addr1":"111 xxx Street","county":"ttt","addrtype":"MailingAddress","city":"zzz"},"nameinfo":{"name":"xxx"}}}}}

tkang007 commented 8 years ago

I found a workaround as below.

extract.hql: add jar /home/myuser/lib/json-udf-1.3.8-SNAPSHOT-jar-with-dependencies.jar; create temporary function tjson as 'org.openx.data.udf.JsonUDF';

insert overwrite local directory '/home/myuser/output' select tjson(named_struct("cola1", vt.cola1, "cola2", vt.cola2, ...)) from myview vt ;