tobymao / sqlglot

Python SQL Parser and Transpiler
https://sqlglot.com/
MIT License
6.74k stars 703 forks source link

`json_extract` conversion not working from Presto to Daatbricks and json_size #4395

Closed ganeshdogiparthi-db closed 6 hours ago

ganeshdogiparthi-db commented 7 hours ago

Read Dialect: Presto Write Dialect: Databricks

We observe that "json_extract"(params, '$.dependencies') -> params:dependencies is not happening when the expression is inside the case when clause. Please refer to the below queries for column5

Also, can we add logic for solving presto json_size to Databricks?!

Source query:

SELECT
        id
      , name
      , "json_extract"(params, '$.target') target_table
      , "json_extract"(params, '$.frequency') frequency
      , (CASE WHEN ("json_extract"(params, '$.dependencies') IS NOT NULL) THEN CAST("json_extract"(params, '$.dependencies') AS array(map(varchar,varchar))) ELSE null END) dependencies
      , "json_size"(params, '$.dependencies') total_no_of_upstream_dep
      , "json_extract"(params, '$.query') query
      FROM
           dm

Transpiled query:

SELECT
  id,
  name,
  params:target AS target_table,
  params:frequency AS frequency,
  (
    CASE
      WHEN (
        JSON_EXTRACT(params, '$.dependencies') IS NOT NULL
      )
      THEN FROM_JSON(params:dependencies, 'ARRAY<MAP<STRING, STRING>>')
      ELSE NULL
    END
  ) AS dependencies,
  `JSON_SIZE`(params, '$.dependencies') AS total_no_of_upstream_dep,
  params:query AS query
FROM dm

Expected transpiled query:

SELECT
  id,
  name,
  params:target AS target_table,
  params:frequency AS frequency,
  (
    CASE
      WHEN (
        params:dependencies IS NOT NULL
      )
      THEN FROM_JSON(params:dependencies, 'ARRAY<MAP<STRING, STRING>>')
      ELSE NULL
    END
  ) AS dependencies,
  `JSON_SIZE`(params, '$.dependencies') AS total_no_of_upstream_dep,
  params:query AS query
FROM dm
georgesittas commented 6 hours ago

Hi @ganeshdogiparthi-db, I'm not observing this behavior:

>>> import sqlglot
>>> sqlglot.transpile("""
... SELECT
...         id
...       , name
...       , "json_extract"(params, '$.target') target_table
...       , "json_extract"(params, '$.frequency') frequency
...       , (CASE WHEN ("json_extract"(params, '$.dependencies') IS NOT NULL) THEN CAST("json_extract"(params, '$.dependencies') AS array(map(varchar,varchar))) ELSE null END) dependencies
...       , "json_size"(params, '$.dependencies') total_no_of_upstream_dep
...       , "json_extract"(params, '$.query') query
...       FROM
...            dm
... """, "presto", "databricks")
["SELECT id, name, params:target AS target_table, params:frequency AS frequency, (CASE WHEN (NOT params:dependencies IS NULL) THEN FROM_JSON(params:dependencies, 'ARRAY<MAP<STRING, STRING>>') ELSE NULL END) AS dependencies, `JSON_SIZE`(params, '$.dependencies') AS total_no_of_upstream_dep, params:query AS query FROM dm"]

Make sure you're using the latest version.

Also, can we add logic for solving presto json_size to Databricks?!

This isn't high priority for us right now, but we'll be happy to accept a well-crafted & tested PR.