Below is the create table script to refer to the xml file/folder
CREATE EXTERNAL TABLE kans.xml_ExpPmtSchPubLoan_dt(Identifier INT,Payment map<STRING,STRING>) ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' WITH SERDEPROPERTIES ( "column.xpath.Identifier"="/Loan/Identifier/text()", "column.xpath.Payment"="/Loan/Payments/Payment/*" ) STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION '/user/sxukws/xml/hive_ext_tb/ExpPmtSchPubLoan' TBLPROPERTIES ( "xmlinput.start"="<Loan schemaVersion", "xmlinput.end"="</Loan>" );
Sample XML file is as follows
``
1707391519
2015-09-22T18:39:57.6854690012015-09-22T18:39:57.685504001
``
Below select query returns only the last value and ignores the first one. So the count is only 1
``select Identifier,Payment["BusinessEventDatetime"] from xml_ExpPmtSchPubLoan_dt where Identifier=1707391519 limit 10;``
Is there a pattern that can put BusinessEventDatetime as an individual column and not as a map function?
Thanks!
Below is the create table script to refer to the xml file/folder
CREATE EXTERNAL TABLE kans.xml_ExpPmtSchPubLoan_dt(Identifier INT,Payment map<STRING,STRING>) ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' WITH SERDEPROPERTIES ( "column.xpath.Identifier"="/Loan/Identifier/text()", "column.xpath.Payment"="/Loan/Payments/Payment/*" ) STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION '/user/sxukws/xml/hive_ext_tb/ExpPmtSchPubLoan' TBLPROPERTIES ( "xmlinput.start"="<Loan schemaVersion", "xmlinput.end"="</Loan>" );
Sample XML file is as follows ``