prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.06k stars 5.38k forks source link

json_extract_scalar(), json_extract() and json_size() returns NULL in some cases when it could return a value. #23806

Open spershin opened 1 month ago

spershin commented 1 month ago

Consider the 3 queries below, which all should have the same result:

SELECT json_extract_scalar(c0, '$["hostname"]') from (values ('{"failure_reason": None, "hostname": "twshared38330.04.ftw6.facebook.com"}')) t(c0);
C++:  twshared38330.04.ftw6.facebook.com
Java: NULL

SELECT json_extract_scalar(c0, '$["hostname"]') from (values ('{"hostname": "twshared38330.04.ftw6.facebook.com", "failure_reason": None}')) t(c0);
C++:  twshared38330.04.ftw6.facebook.com
Java: twshared38330.04.ftw6.facebook.com

SELECT json_extract_scalar(c0, '$["hostname"]') from (values ('{"failure_reason": "None", "hostname": "twshared38330.04.ftw6.facebook.com"}')) t(c0);
C++:  twshared38330.04.ftw6.facebook.com
Java: twshared38330.04.ftw6.facebook.com

In the 1st case Java returns NULL, which seem incorrect. See the difference in the input string of the Q1 and Q3 - in the Q1 we have "failure_reason": None, where None is not in quotes.

json_extract() behaves the same.

For the query

SELECT json_extract_scalar(c0, '$["failure_reason"]') from (values ('{"failure_reason": None, "hostname": "twshared38330.04.ftw6.facebook.com"}')) t(c0);
C++: NULL
Java: NULL

Meaning both C++ and Java treat the particular element is invalid.

It looks like Java abandons search when hitting None, because it is 'invalid' json?

Similarly, json_size() behaves:

SELECT json_size(c0, '$') x from (values ('{"key":value}')) t(c0);
Java: NULL
C++: 1

SELECT json_size(c0, '$') x from (values ('{"key":"value"}')) t(c0);
Java: 1
C++: 1

SELECT json_size(c0, '$') x from (values ('{"key":200}')) t(c0);
Java: 1
C++: 1

It might be that the implicit conversion from varchar to json type causes functions to return NULL.

tdcmeehan commented 1 month ago

@spershin to word this bug differently, you believe Presto should make a better best effort attempt at parsing the first example, even though it is not valid JSON?

spershin commented 1 month ago

@spershin to word this bug differently, you believe Presto should make a better best effort attempt at parsing the first example, even though it is not valid JSON?

@tdcmeehan That's one way to go. Of course, leaving it "as is" is valid too.