klahnakoski / mo-sql-parsing

Let's make a SQL parser so we can provide a familiar interface to non-sql datastores!
Mozilla Public License 2.0
255 stars 58 forks source link

The tree of where clause does not seem to be creating correctly #238

Closed debedb closed 3 months ago

debedb commented 3 months ago

Consider the following code:

from mo_sql_parsing import parse, format
import json

where = {
 "or": [
      {
        "and": [
          {
            "gt": [
              "salary",
              1000
            ]
          },
          {
            "between": [
              "e.hired_date",
              {
                "literal": "2020-01-01"
              },
              {
                "literal": "2020-12-31"
              }
            ]
          }
        ]
      },
      {
        "and": [
          {
            "lt": [
              "salary",
              1000
            ]
          },
          {
            "between": [
              "e.hired_date",
              {
                "literal": "2021-01-01"
              },
              {
                "literal": "2021-12-31"
              }
            ]
          }
        ]
      }
    ]
  }

ast = {
    "select" : [
        {
            "name": "total_cnt",
            "value": {
            "count": "*"
           }
        }
    ],
    "from": [
        {
            "value": "foo"
        }
    ],
    "where" : where
}
print(json.dumps(ast, indent=2))
query = format(ast)
print(query)

The AST provided is

{
  "select": [
    {
      "name": "total_cnt",
      "value": {
        "count": "*"
      }
    }
  ],
  "from": [
    {
      "value": "foo"
    }
  ],
  "where": {
    "or": [
      {
        "and": [
          {
            "gt": [
              "salary",
              1000
            ]
          },
          {
            "between": [
              "e.hired_date",
              {
                "literal": "2020-01-01"
              },
              {
                "literal": "2020-12-31"
              }
            ]
          }
        ]
      },
      {
        "and": [
          {
            "lt": [
              "salary",
              1000
            ]
          },
          {
            "between": [
              "e.hired_date",
              {
                "literal": "2021-01-01"
              },
              {
                "literal": "2021-12-31"
              }
            ]
          }
        ]
      }
    ]
  }
}

The output is

SELECT COUNT(*) AS total_cnt 
FROM foo 
WHERE salary > 1000 AND e.hired_date BETWEEN '2020-01-01' AND '2020-12-31' 
OR 
(salary < 1000 AND e.hired_date BETWEEN '2021-01-01' AND '2021-12-31')

Yet I expect the output to be

SELECT COUNT(*) AS total_cnt 
FROM foo 
WHERE (salary > 1000 AND e.hired_date BETWEEN '2020-01-01' AND '2020-12-31' )
OR 
(salary < 1000 AND e.hired_date BETWEEN '2021-01-01' AND '2021-12-31')
debedb commented 3 months ago
  1. While AND does have higher precedence, the problem would be still there if the operators are reversed
  2. A simple but inelegant solution is to wrap every binary op in parentheses of course.
klahnakoski commented 3 months ago

I am not sure if there is a problem here. The formatting functions track the operator precedence to minimized the number of parentheses generated. If you reverse the operators, parenthesis will be included.

I ran your ast through the formatter to get a single line with no parenthesis, and it appears correct.

SELECT COUNT(*) AS total_cnt FROM foo WHERE salary > 1000 AND e.hired_date BETWEEN '2020-01-01' AND '2020-12-31' OR salary < 1000 AND e.hired_date BETWEEN '2021-01-01' AND '2021-12-31'
debedb commented 3 months ago

@klahnakoski yes, you are correct; I have tried reversing AND and OR and the proper paretheses are there. My mistake. Closing.