jakartaee / data

Data-API
Apache License 2.0
87 stars 27 forks source link

[clarification]: Clarification on Operator Precedence for SQL and NoSQL Databases #795

Open otaviojava opened 2 months ago

otaviojava commented 2 months ago

Specification

https://github.com/jakartaee/data/blob/main/spec/src/main/asciidoc/query-language.asciidoc

I need clarification on ...

Please clarify or update documentation regarding the following statement:

Syntactically, logical operators are parsed with lower precedence than equality and inequality operators and other conditional expressions listed above. The NOT operator has higher precedence than AND and OR. The AND operator has higher precedence than OR.

Issue:

Request:

Thank you for your attention to this matter.

Additional information

No response

njr-11 commented 2 months ago

JDQL also defines parentheses (, ) for grouping. If NoSQL databases are able to support those, I would expect the same mechanism to be usable to support the specified operator precedence, which is really just implicit grouping.

otaviojava commented 1 month ago

Sorry by delay, I was checking this option:

MongoDB does not support parentheses for grouping. Instead, MongoDB uses explicit JSON-like syntax with $and and $or operators to define the structure and precedence of queries.

Redis and Cassandra do not have support for it as well.

At Graph, Neo4J follows the same idea of SQL on grouping and the "AND" precedence.

njr-11 commented 1 month ago

I poorly worded my reply. It should say, if you able to implement the JDQL requirements of allowing parenthesis for grouping by some underlying mechanism of the NoSQL database, then you should be able to use that same mechanism to implement precedence.

So when you say:

MongoDB does not support parentheses for grouping. Instead, MongoDB uses explicit JSON-like syntax with $and and $or operators to define the structure and precedence of queries.

that should be fine. MongoDB doesn't need to have syntax that matches the JDQL with parenthesis. The mechanism you said MongoDB has seems good enough to support parentheses in JDQL.

For example, with this JDQL,

@Query("WHERE (yearProduced > 2020 AND price < 100) OR (condition = 'REFURBISHED' AND price < 80)")

you can send the following to MongoDB:

$or: [ { $and: [ { "yearProduced": { $gt: 2020 } },
                 { "price": { $lt: 100 } }
               ]
       },
       { $and: [ { "condition": "REFURBISHED" },
                 { "price": { $lt: 80 } },
               ]
       }
     ]

Note that you could send the exact same thing to MongoDB if the query omitted the parentheses and relied upon precedence rules:

@Query("WHERE yearProduced > 2020 AND price < 100 OR condition = 'REFURBISHED' AND price < 80")
otaviojava commented 1 month ago

My point is the AND as precedence.

MongoDB does not prioritize AND over OR or vice versa; the precedence is determined by the structure of the query:

Let's use this query:

WHERE numBitsRequired = :bits OR numType = :type AND id < :xmax
{
   "$or":[
      {
         "numBitsRequired":4
      },
      {
         "numType":"COMPOSITE"
      },
      {
         "$and":[
            {
               "_id":{
                  "$lt":20
               }
            }
         ]
      }
   ]
}

⚠️ As I said in the meeting, I am using MongoDB as a reference, the point is, unfortunately, I cannot have this precedence guarantee as we have in SQL.

njr-11 commented 1 month ago

My point is the AND as precedence.

MongoDB does not prioritize AND over OR or vice versa; the precedence is determined by the structure of the query:

Let's use this query:

WHERE numBitsRequired = :bits OR numType = :type AND id < :xmax
{
   "$or":[
      {
         "numBitsRequired":4
      },
      {
         "numType":"COMPOSITE"
      },
      {
         "$and":[
            {
               "_id":{
                  "$lt":20
               }
            }
         ]
      }
   ]
}

⚠️ As I said in the meeting, I am using MongoDB as a reference, the point is, unfortunately, I cannot have this precedence guarantee as we have in SQL.

No, your example above indicates a bug in your implementation. Your implementation is responsible for interpreting the precedence according to the rules of the Jakarta Data specification and translating that to the mechanism of the underlying database, which for the example above, WHERE numBitsRequired = :bits OR numType = :type AND id < :xmax, you should be doing:

{
  "$and":[
    {
      "$or":[
        {
          "numBitsRequired":4
        },
        {
           "numType":"COMPOSITE"
        }
      ]
    }
    {
      "_id":{
        "$lt":20
      }
    }
  ]
}
otaviojava commented 1 month ago

Thanks, but I am curious about the scope of this language, how much it should impact the database procedure or the native behavior of a database engine, and how much this SQL simulation on NoSQL could impact performance.

IMHO, if the database has precedence under the order explicitly defined by the query's nesting and logical operator usage, it should not change it by language.

We should embrace SQL behavior as with NoSQL and keep the boundaries of those databases.

It's worth noting that Cassandra, for instance, does not support certain operations like parentheses and grouping. Understanding these limitations is crucial for effective database management. Otherwise, we need to do multiple queries and handle them by memory.

While it's true that some NoSQL databases, like OrientDB and Oracle NoSQL, implement SQL language, it's important to remember that behavior may not be consistent across all NoSQL databases. This caution is necessary to avoid making assumptions that could lead to unexpected results in database management.

My research is ongoing, and I'm currently considering two potential options. I value your input and look forward to discussing these options further.

1) Update the spec precedence and explain that it might change on NoSQL databases. Also, make the parentheses optional and throw unsupported operation exceptions. 2) Explain that Jakarta Data Query language works only for SQL databases; once, Cassandra, Redis, and some document databases cannot implement it.

:warning: I am still researching it and am super open to suggestions.

njr-11 commented 1 month ago

Thanks, but I am curious about the scope of this language, how much it should impact the database procedure or the native behavior of a database engine, and how much this SQL simulation on NoSQL could impact performance.

Any impact to performance would only occur when the particular keyword or operator that you are implementing is used. So the user gets to opt in by choosing to use it if they value the function it provides. And if they choose not to use it, there should be no impact to them at all. I don't see a problem here. A vendor is free to document that certain operations might be less efficient on NoSQL if that is so, and recommend for and against whatever patterns they see fit.

  1. Update the spec precedence and explain that it might change on NoSQL databases. Also, make the parentheses optional and throw unsupported operation exceptions.

Parentheses (in JDQL) and default order of precedence when parentheses are not used should be handled the same as the other differences between relational/graph/document/column/key-value. The spec should document it in terms of what the category of database is capable of, and the TCK should arrange its assertions accordingly, just as we did for the other capabilities that differ.