prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.06k stars 5.38k forks source link

json_parse parse's bad jsons #24090

Open kgpai opened 1 hour ago

kgpai commented 1 hour ago

Currently json_parse parse's json's that are verifiably bad or incomplete (See example below). In the prestissimo project we are wondering whether this behavior is something we should emulate and whether this behavior is something customers expect. Secondly as you can see (row 2 vs row 3 in example below) the behavior with bad json's is also not consistent.

Expected Behavior

It would be easier and more consistent if Presto just threw on bad or incomplete json's rather than returning partial results.

Current Behavior

Below is a snippet of current behavior:

presto:default> SELECT try(json_parse(x)), TRY_CAST(TRY("json_parse"(x)) AS array(varchar)), ARRAY[X] from (values '[2] [3]', '[3] bad json]', '[4] [bad json') as t(x
);
 _col0 | _col1 |      _col2       
-------+-------+------------------
 [2]   | [2]   | [[2] [3]]        
 NULL  | NULL  | [[3] bad json]]  
 [4]   | [4]   | [[4] [bad json] 
(3 rows)

Possible Solution

Throw on bad json's.

Context

We are currently trying to verify and get Prestissimo behavior similar to Presto behavior and running into verification errors due to this.

kgpai commented 1 hour ago

cc: @rschlussel @amitkdutta