rcongiu / Hive-JSON-Serde

Read - Write JSON SerDe for Apache Hive.
Other
732 stars 391 forks source link

Serde Issues with array at the top level #150

Open RaniRaven opened 8 years ago

RaniRaven commented 8 years ago

Hi,

Given a dynamic-sized array of the form [{"aa" : "bb"}, {"aa" : "cc"}, {"aa":"dd"}] There seem to be no solution with the serde other than defining 3 struct. an implementation of cretae external table x (a array<struct>) will no work. The question is why, and if this can be fixed. Since other than that it seems like the perfect solution for json handling.

rcongiu commented 8 years ago

create table x ( a array <struct > ) will work fine with your example

rcongiu commented 8 years ago

'Create table x (a array <struct >)'

rcongiu commented 8 years ago

Sorry on cell phone.. can't find bacticks on this keyboard

RaniRaven commented 8 years ago

I probably do not understand how.Try using this file " [{xxx:{"name":"sergay"}},{xxx:{"name":"zzz"}}] [{xxx:{"name":"sergay1"}},{xxx:{"name":"zzzb"}}] "

then use the following external tables (assuming the file is in path /user/cloudera/json drop table json_test1; create external table json_test1 ( a array<struct<xxx:struct>> ) row format serde 'org.openx.data.jsonserde.JsonSerDe' stored as TEXTFILE LOCATION '/user/cloudera/json'; drop table json_test2; create external table json_test2 ( a structxxx:struct, b structxxx:struct ) row format serde 'org.openx.data.jsonserde.JsonSerDe' stored as TEXTFILE LOCATION '/user/cloudera/json';

see what you get as you select from both table (while the 2nd works fine, the 1st will result in error) Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.openx.data.jsonserde.json.JSONObject cannot be cast to org.openx.data.jsonserde.json.JSONArray

RaniRaven commented 8 years ago

So it does not seem to work, unfortunately

rcongiu commented 8 years ago

Oh I see. So, I coded a while ago a feature to support an array as top level object instead of a struct. You can change the data to a struct { col: [...] } to make it work

boje commented 7 years ago

I am facing same error. My JSON look like this.

[{
    "test": "a",
    "world": "b",
    "time": [{
        "start": "b",
        "end": "99"
    }]
}]
rcongiu commented 7 years ago

This one should work, declared as array<struct<test:string,world:string,time:<array<struct<start:string,end:string>>>>>


"Good judgment comes from experience. Experience comes from bad judgment"


On Thursday, January 5, 2017 8:13 AM, Anders Boje Larsen <notifications@github.com> wrote:

I am facing same error. My JSON look like this.[{ "test": "a", "world": "b", "time": [{ "start": "b", "end": "99" }] }] — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

eastTowest commented 7 years ago

I have the same issue different error

I reproduced the issue even for the json @boje provided with @rcongiu declaration; [{ "test": "a", "world": "b", "time": [{ "start": "b", "end": "99" }] }]

table I try to create is; create external table message( msg array<struct<test:string,world:string>> ) ROW FORMAT serde 'org.openx.data.jsonserde.JsonSerDe'

I get this error;

Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Missing value at 1 [character 2 line 1]

Any help would be truly appreciated

rcongiu commented 7 years ago

It should support top level arrays. Have you tried declaring the table as  create external table message( msg struct<test:string,world:string> ) ROW FORMAT serde 'org.openx.data.jsonserde.JsonSerDe'

? ------------------------------------------------------- "Good judgment comes from experience. Experience comes from bad judgment"


On Tuesday, January 17, 2017 2:03 PM, Bilal Nemutlu <notifications@github.com> wrote:

I have the same issue different errorI reproduced the issue even for the json @boje provided with @rcongiu declaration; [{ "test": "a", "world": "b", "time": [{ "start": "b", "end": "99" }] }]table I try to create is; create external table message( msg array<struct<test:string,world:string>> ) ROW FORMAT serde 'org.openx.data.jsonserde.JsonSerDe'I get this error;Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Missing value at 1 [character 2 line 1]Any help would be truly appreciated— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

eastTowest commented 7 years ago

I got the problem that you helped resolved here

However I see another issue, when i query the table, it returns the first object in that array

the file has the following one line json array [{"event":{"id":"1","type":"c"}},{"event":{"id":"2","type":"d"}}]

table declaration msg array<struct<event:struct<id:string, type:string>>> it returns only this; [{"event":{"id":"1","type":"c"}}]

i checked the count(*), it shows 1 row only

Thank you bunch for the help

eastTowest commented 7 years ago

As far as i see, it considers the input json as just one row and it returns the first object within that! I also tried struct msg struct<event:struct<id:string, type:string>> but no luck.

changed the json structure as [{"id":"1","type":"c"},{"id":"2","type":"d"}]and table declaration msg struct<id:string, type:string> no luck again and just returns the first one {"id":"1","type":"c"}

I think i am very close to finish it up with your serde and I'd really appreciate your support!

sushantkhanayat commented 6 years ago

Hi eastTowest ,

Were you able to solve this, i am facing the same issue.

sindhujadhamodaran commented 6 years ago

Did anyone solve this issue. I am facing similar issue with top level array. can anyone suggest me a solution.

[{"user_id": "a"," previous_user_id": "b"},{"user_id": "c"," previous_user_id": "d"},{"user_id": "e"," previous_user_id": "f"}] create external table array_tmp (User array<struct<user_id: String, previous_user_id:String>>) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'

on select user.user_id from array_tmp it gives exception as Row is not a valid JSON Object