nazgul33 / impala-get-json-object-udf

A UDF for Cloudera Impala ( hive get_json_object equivalent )
32 stars 25 forks source link

Nested JSON breaks connection to impala #7

Open scratch28 opened 6 years ago

scratch28 commented 6 years ago

Impala version 2.8 UDF breaks connection upon trying to deal with nested arrays example JSON: {"customer_info":[{"field_name":"family_names","field_value":"Gonzalez"},{"field_name":"given_names","field_value":"Pablo"}],"phone":null}

this works select json_get_object('{"customer_info":[{"field_name":"family_names","field_value":"Gonzalez"},{"field_name":"given_names","field_value":"Pablo"}],"phone":null}','$.customer_info') ;

but this breaks impala select json_get_object('{"customer_info":[{"field_name":"family_names","field_value":"Gonzalez"},{"field_name":"given_names","field_value":"Pablo"}],"phone":null}','$.customer_info.field_name') ;

ajfg93 commented 3 years ago

@scratch28

replace this line

https://github.com/nazgul33/impala-get-json-object-udf/blob/49e151f7cff9686a1197ca9283bcd847e4470812/jsonUdf.cc#L71

to

    if (va.IsObject() && va.HasMember(key)) { \

and recompile.

IsObject() needs to be checked before calling HasMember.

See from rapidson/document.h:

#if RAPIDJSON_HAS_STDSTRING
    //! Check whether a member exists in the object with string object.
    /*!
        \param name Member name to be searched.
        \pre IsObject() == true
        \return Whether a member with that name exists.
        \note It is better to use FindMember() directly if you need the obtain the value as well.
        \note Linear time complexity.
    */
    bool HasMember(const std::basic_string<Ch>& name) const { return FindMember(name) != MemberEnd(); }
#endif

\pre IsObject() == true, I thint that means pre-condition.