rcongiu / Hive-JSON-Serde

Read - Write JSON SerDe for Apache Hive.
Other
733 stars 391 forks source link

Add configuration for explicit null value in the serialized output JSON #198

Closed lyair1 closed 6 years ago

lyair1 commented 6 years ago

In order to be complaint with some object oriented systems an explicit 'null' json value is required in the serialized string. As default, Hive-JSON-Serde will not produce null values in the output serialized JSON string and just drop the key, if you do want to have explicit 'null' values in your output JSON string, use the following:

DROP TABLE tableWithNull;
CREATE EXTERNAL TABLE tableWithNull
(
    `stringCol` STRING,
    'stringNullCol' STRING,
    'stringMissingCol' STRING,
    'structCol' STRUCT<name : STRING>,
    'structNullCol' STRUCT<name : STRING>,
    'structMissingCol' STRUCT<name : STRING>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ("explicit.null.value" = "true");

-- JSON string: {\"stringCol\":"blabla",\"stringNullCol\":null,\"structCol\":{\"name\":\"myName\"},\"structNullCol\":{\"name\":null}}
LOAD DATA LOCAL INPATH 'pathToJsonFile.json' OVERWRITE INTO TABLE tableWithNull;

-- The output when ("explicit.null.value" = "true"):
-- {\"stringCol\":"blabla",\"stringNullCol\":null,\"stringMissingCol\":null,\"structCol\":{\"name\":\"myName\"},\"structNullCol\":{\"name\":null},\"structMissingCol\":null}

-- The default output or when ("explicit.null.value" = "false"):
-- {\"stringCol\":"blabla",\"structCol\":{\"name\":\"myName\"},\"structNullCol\":{}}
rcongiu commented 6 years ago

Thanks for the pull request, will review and merge over the weekend

lyair1 commented 6 years ago

@rcongiu, please review the change and leave any comments.

Sorry for closing the request, it was by mistake.

rcongiu commented 6 years ago

I apologize for the delay. Looks good, it's an useful feature, unit tests are there... thank you for the high quality contribution!