rcongiu / Hive-JSON-Serde

Read - Write JSON SerDe for Apache Hive.
Other
732 stars 391 forks source link

Following on Scalar to Array cast issue #164

Open Cord-Thomas opened 7 years ago

Cord-Thomas commented 7 years ago

Following on #12, I am able to successfully query my Hive schema using the 1.3.8 snapshot build from CLI just fine. Good stuff - this is less of an issue than a plea for insight.

How, I am trying to access this same table schema from Tableau. I am getting the old error:

Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.openx.data.jsonserde.json.JSONArray at org.openx.data.jsonserde.objectinspector.JsonListObjectInspector.getList(JsonListObjectInspector.java:38) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:297)

I understand you may not have experience with Tableau, but is there something about this query (they transform the user query by adding the 'select 1 fas 'number of records' bit.

Schema: ADD JAR /usr/hdp/2.2.6.0-2800/hive/lib/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar CREATE EXTERNAL TABLE pubs ( document_number STRING, document_title STRING, series STRING, inv_status STRING, product_formats STRING, product_date SMALLINT, long_abstract STRING, short_abstract STRING, categories ARRAY, research_areas ARRAY, primary_research_area STRING, asset_links_path ARRAY, authors ARRAY, author_types ARRAY, publisher STRING, fileReference STRING, comments STRING ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ("ignore.malformed.json" = "true") LOCATION '/data/pubs';

SQL Statement generated by Tableau (that works from CLI):

SELECT 1 AS number_of_records, pubs.asset_links_path AS asset_links_path, pubs.author_types AS author_types, pubs.authors AS authors, pubs.categories AS categories, pubs.comments AS comments, pubs.document_number AS document_number, pubs.document_title AS document_title, pubs.filereference AS filereference, pubs.inv_status AS inv_status, pubs.long_abstract AS long_abstract, pubs.primary_research_area AS primary_research_area, pubs.product_date AS product_date, pubs.product_formats AS product_formats, pubs.publisher AS publisher, pubs.research_areas AS research_areas, pubs.series AS series, pubs.short_abstract AS short_abstract FROM publications.pubs pubs LIMIT 10;

C.json.zip

rcongiu commented 7 years ago

Maybe Tableau is running on a different machine/different hive server that has the old serde or caching it ?Hard to tell which one is using sine the old serde and the new one have the same name.Try renaming it (like giving it a version of 1.3.8.1-SNAPSHOT) to make sure it's using the new one.  R.------------------------------------------------------- "Good judgment comes from experience.

Experience comes from bad judgment"

On Friday, September 30, 2016 9:24 AM, Cord-Thomas <notifications@github.com> wrote:

Following on #12, I am able to successfully query my Hive schema using the 1.3.8 snapshot build from CLI just fine. Good stuff - this is less of an issue than a plea for insight.How, I am trying to access this same table schema from Tableau. I am getting the old error:Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.openx.data.jsonserde.json.JSONArray at org.openx.data.jsonserde.objectinspector.JsonListObjectInspector.getList(JsonListObjectInspector.java:38) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:297)I understand you may not have experience with Tableau, but is there something about this query (they transform the user query by adding the 'select 1 fas 'number of records' bit.Schema: ADD JAR /usr/hdp/2.2.6.0-2800/hive/lib/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar CREATE EXTERNAL TABLE pubs ( document_number STRING, document_title STRING, series STRING, inv_status STRING, product_formats STRING, product_date SMALLINT, long_abstract STRING, short_abstract STRING, categories ARRAY, research_areas ARRAY, primary_research_area STRING, asset_links_path ARRAY, authors ARRAY, author_types ARRAY, publisher STRING, fileReference STRING, comments STRING ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ("ignore.malformed.json" = "true") LOCATION '/data/pubs';SQL Statement generated by Tableau (that works from CLI):SELECT 1 AS number_of_records, pubs.asset_links_path AS asset_links_path, pubs.author_types AS author_types, pubs.authors AS authors, pubs.categories AS categories, pubs.comments AS comments, pubs.document_number AS document_number, pubs.document_title AS document_title, pubs.filereference AS filereference, pubs.inv_status AS inv_status, pubs.long_abstract AS long_abstract, pubs.primary_research_area AS primary_research_area, pubs.product_date AS product_date, pubs.product_formats AS product_formats, pubs.publisher AS publisher, pubs.research_areas AS research_areas, pubs.series AS series, pubs.short_abstract AS short_abstract FROM publications.pubs pubs LIMIT 10;C.json.zip— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Cord-Thomas commented 7 years ago

Thank you. I am suspecting caching as I am fairly certain it's pointing to the server with the proper SerDe. Appreciate the input

Cord Thomas Director, Emerging Technology RAND Corporation

On Sep 30, 2016, at 10:17 AM, Roberto Congiu notifications@github.com<mailto:notifications@github.com> wrote:

Maybe Tableau is running on a different machine/different hive server that has the old serde or caching it ?Hard to tell which one is using sine the old serde and the new one have the same name.Try renaming it (like giving it a version of 1.3.8.1-SNAPSHOT) to make sure it's using the new one. R.------------------------------------------------------- "Good judgment comes from experience.

Experience comes from bad judgment"

On Friday, September 30, 2016 9:24 AM, Cord-Thomas notifications@github.com<mailto:notifications@github.com> wrote:

Following on #12, I am able to successfully query my Hive schema using the 1.3.8 snapshot build from CLI just fine. Good stuff - this is less of an issue than a plea for insight.How, I am trying to access this same table schema from Tableau. I am getting the old error:Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.openx.data.jsonserde.json.JSONArray at org.openx.data.jsonserde.objectinspector.JsonListObjectInspector.getList(JsonListObjectInspector.java:38) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:297)I understand you may not have experience with Tableau, but is there something about this query (they transform the user query by adding the 'select 1 fas 'number of records' bit.Schema: ADD JAR /usr/hdp/2.2.6.0-2800/hive/lib/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar CREATE EXTERNAL TABLE pubs ( document_number STRING, document_title STRING, series STRING, inv_status STRING, product_formats STRING, product_date SMALLINT, long_abstract STRING, short_abstract STRING, categories ARRAY, research_areas ARRAY, primary_research_area STRING, asset_links_path ARRAY, authors ARRAY, author_types ARRAY, publisher STRING, fileReference STRING, comments STRING ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ("ignore.malformed.json" = "true") LOCATION '/data/pubs';SQL Statement generated by Tableau (that works from CLI):SELECT 1 AS number_of_records, pubs.asset_links_path AS asset_links_path, pubs.author_types AS author_types, pubs.authors AS authors, pubs.categories AS categories, pubs.comments AS comments, pubs.document_number AS document_number, pubs.document_title AS document_title, pubs.filereference AS filereference, pubs.inv_status AS inv_status, pubs.long_abstract AS long_abstract, pubs.primary_research_area AS primary_research_area, pubs.product_date AS product_date, pubs.product_formats AS product_formats, pubs.publisher AS publisher, pubs.research_areas AS research_areas, pubs.series AS series, pubs.short_abstract AS short_abstract FROM publications.pubs pubs LIMIT 10;C.json.zip— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/rcongiu/Hive-JSON-Serde/issues/164#issuecomment-250801305, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATlvMJfYgkLm1hid7iNEhcBsgdDiao-sks5qvUP9gaJpZM4KLRJs.


This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

rcongiu commented 7 years ago

The SerDe is definitely cached by the hive server, so also restarting hive server may fix it. R. ------------------------------------------------------- "Good judgment comes from experience.

Experience comes from bad judgment"

On Friday, September 30, 2016 3:49 PM, Cord-Thomas <notifications@github.com> wrote:

Thank you. I am suspecting caching as I am fairly certain it's pointing to the server with the proper SerDe. Appreciate the input

Cord Thomas Director, Emerging Technology RAND Corporation

On Sep 30, 2016, at 10:17 AM, Roberto Congiu notifications@github.com<mailto:notifications@github.com> wrote:

Maybe Tableau is running on a different machine/different hive server that has the old serde or caching it ?Hard to tell which one is using sine the old serde and the new one have the same name.Try renaming it (like giving it a version of 1.3.8.1-SNAPSHOT) to make sure it's using the new one. R.------------------------------------------------------- "Good judgment comes from experience.

Experience comes from bad judgment"

On Friday, September 30, 2016 9:24 AM, Cord-Thomas notifications@github.com<mailto:notifications@github.com> wrote:

Following on #12, I am able to successfully query my Hive schema using the 1.3.8 snapshot build from CLI just fine. Good stuff - this is less of an issue than a plea for insight.How, I am trying to access this same table schema from Tableau. I am getting the old error:Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.openx.data.jsonserde.json.JSONArray at org.openx.data.jsonserde.objectinspector.JsonListObjectInspector.getList(JsonListObjectInspector.java:38) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:297)I understand you may not have experience with Tableau, but is there something about this query (they transform the user query by adding the 'select 1 fas 'number of records' bit.Schema: ADD JAR /usr/hdp/2.2.6.0-2800/hive/lib/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar CREATE EXTERNAL TABLE pubs ( document_number STRING, document_title STRING, series STRING, inv_status STRING, product_formats STRING, product_date SMALLINT, long_abstract STRING, short_abstract STRING, categories ARRAY, research_areas ARRAY, primary_research_area STRING, asset_links_path ARRAY, authors ARRAY, author_types ARRAY, publisher STRING, fileReference STRING, comments STRING ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ("ignore.malformed.json" = "true") LOCATION '/data/pubs';SQL Statement generated by Tableau (that works from CLI):SELECT 1 AS number_of_records, pubs.asset_links_path AS asset_links_path, pubs.author_types AS author_types, pubs.authors AS authors, pubs.categories AS categories, pubs.comments AS comments, pubs.document_number AS document_number, pubs.document_title AS document_title, pubs.filereference AS filereference, pubs.inv_status AS inv_status, pubs.long_abstract AS long_abstract, pubs.primary_research_area AS primary_research_area, pubs.product_date AS product_date, pubs.product_formats AS product_formats, pubs.publisher AS publisher, pubs.research_areas AS research_areas, pubs.series AS series, pubs.short_abstract AS short_abstract FROM publications.pubs pubs LIMIT 10;C.json.zip— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/rcongiu/Hive-JSON-Serde/issues/164#issuecomment-250801305, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ATlvMJfYgkLm1hid7iNEhcBsgdDiao-sks5qvUP9gaJpZM4KLRJs.


This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.