rcongiu / Hive-JSON-Serde

Read - Write JSON SerDe for Apache Hive.
Other
732 stars 391 forks source link

Incorrect data count in selection of data using XPATH #170

Closed 2k8kanchan closed 7 years ago

2k8kanchan commented 7 years ago

Below is the create table script to refer to the xml file/folder CREATE EXTERNAL TABLE kans.xml_ExpPmtSchPubLoan_dt(Identifier INT,Payment map<STRING,STRING>) ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' WITH SERDEPROPERTIES ( "column.xpath.Identifier"="/Loan/Identifier/text()", "column.xpath.Payment"="/Loan/Payments/Payment/*" ) STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION '/user/sxukws/xml/hive_ext_tb/ExpPmtSchPubLoan' TBLPROPERTIES ( "xmlinput.start"="<Loan schemaVersion", "xmlinput.end"="</Loan>" ); Sample XML file is as follows ``

1707391519 2015-09-22T18:39:57.685469001 2015-09-22T18:39:57.685504001 `` Below select query returns only the last value and ignores the first one. So the count is only 1 ``select Identifier,Payment["BusinessEventDatetime"] from xml_ExpPmtSchPubLoan_dt where Identifier=1707391519 limit 10;`` Is there a pattern that can put BusinessEventDatetime as an individual column and not as a map function? Thanks!
rcongiu commented 7 years ago

I am confused... this project is for the JSON SerDe, not the XML SerDe

rcongiu commented 7 years ago

Closing as not relevant, not even using the json serde.