rcongiu / Hive-JSON-Serde

Read - Write JSON SerDe for Apache Hive.
Other
733 stars 393 forks source link

Mapping nested properties, but make them top-level columns #191

Open casidiablo opened 7 years ago

casidiablo commented 7 years ago

I have a use case that I'm not sure if I'll be able to achieve with this library without forking it. Basically, I have JSON objects that look like this:

{
  "foo": "bar",
  "baz": {
    "qux": "fus"
  }
}

What I would like to do is something like this:

CREATE EXTERNAL TABLE xxx (
  foo STRING,
  qux STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ( 
 'mapping.qux'='baz.qux'
)

Basically, I want to access the inner qux field from a top-level column name.

I have previously achieved this by using a second table where I do the usual colName struct<field:type> stuff, and then select the nested properties into the final table. However, I would like to avoid this extra step as the table will read from a gargantuan amount of data and it would just make things super slow.

rcongiu commented 7 years ago

I see... yeah, it is not currently possible to do that, but it's an idea...

adam-rudd-myob commented 6 years ago

+1

gabrywu commented 5 years ago

+1

JadePerle commented 4 years ago

+1

luong-komorebi commented 3 years ago

This feature when possible would provide a nice addition to the platform