rcongiu / Hive-JSON-Serde

Read - Write JSON SerDe for Apache Hive.
Other
733 stars 391 forks source link

Accessing a sibling's value from a JSON Array in JSONserde #195

Open saidatta opened 7 years ago

saidatta commented 7 years ago

Hey,

Given This is my JSON

...
    "actions": [
      {
        "action_type": "link_click",
        "value": "1"
      },
      {
        "action_type": "page_engagement",
        "value": "1"
      },
      {
        "action_type": "post_engagement",
        "value": "1"
      },
      .....
    ]
  }
...

I would like to map each action_type's value into a specific column of my external Table

CREATE EXTERNAL TABLE IF NOT EXISTS  raw_marketing_other.facebook_ad_cost_dtl_hrly (
   ...
   linkClick : int
   pageEngagement : int
...
)

For example, linkClick column will have the value of 1 derived from that json object.

Question Is this possible in the mapping of serdeProperties? something like this.

WITH serdeproperties(
    "mapping.linkClick" = "actions[action_type="link_click].value",
...

Note: where actions[action_type="link_click]" is meant to return the first occurence of json object with that condition?

If it is not possible through custom mappings, then is there any alternative solution for it?

Thanks

rcongiu commented 7 years ago

I see... I was about to tell you to try with views, but it's not that easy since you want to pick something in the array.

There are 2 ways you can do something like that, one is to conver 'action'to string and use https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-get_json_object

or to write a generic UDF that takes an array of structs, and outputs the metric you want.

The SerDe does not do complex mapping, that is better done upstream. The mapping is meant for those occasions when the naming in json is not compatible with hive (different rules for case sensitivity, reserved words, etc).

R.

2017-08-23 11:10 GMT-07:00 Venkata Munnangi notifications@github.com:

Hey,

Given This is my JSON

... "actions": [ { "action_type": "link_click", "value": "1" }, { "action_type": "page_engagement", "value": "1" }, { "action_type": "post_engagement", "value": "1" }, ..... ] } ...

I would like to map each action_type's value into a specific column of my external Table

CREATE EXTERNAL TABLE IF NOT EXISTS raw_marketing_other.facebook_ad_cost_dtl_hrly ( ... linkClick : int pageEngagement : int ... )

Question Is this possible in the mapping of serdeProperties? something like this.

WITH serdeproperties( "mapping.linkClick" = "actions[action_type="link_click].value", ...

Note: where actions[action_type="link_click]" is meant to return the first occurence of json object with that condition?

If it is not possible through custom mappings, then is there any alternative solution for it?

Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rcongiu/Hive-JSON-Serde/issues/195, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3mtfdEUolKBPTeDs2ltGTg9Sb0Wdwvks5sbGsSgaJpZM4PAYpA .

saidatta commented 7 years ago

Bummer.

I can do mapping with nested map attributes, right? For example

    "actions": {
        "link_click": "1",
        "page_like": "2"
      },
      .....

and do

WITH serdeproperties(
    "mapping.linkClick" = "actions.link_click",
    "mapping.page_like" = "actions.page_like",
...

for mapping linkClick and page_like columns to the nested attributes link_click and page_like respectively.

Do I need to provide any additional flags?