orientechnologies / orientdb

OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
https://orientdb.dev
Apache License 2.0
4.73k stars 871 forks source link

Suggestion: Change in SQL behavior from 2.2 to 3.0 for IN syntax #8044

Closed ShakAttacked closed 3 years ago

ShakAttacked commented 6 years ago

OrientDB Version: 3.0

Java Version: 1.8

OS: Centos

In 2.2, this query would return result. SELECT * FROM table WHERE list in "15169770729" AND password = "booo"

where in 3.0, this array syntax is required. SELECT * FROM table WHERE list in ["15169770729"] AND password = "booo"

I see that there is now contains and containsall, which I believe are new to 3.0. is there a performance difference between IN and Contains?

Safwan

luigidellaquila commented 6 years ago

Hi @ShakAttacked

Actually CONTAINS and CONTAINSALL were there also in v 2.2

In v 3.0 we made the evaluation of IN and CONTAINS more strict, in the sense that list IN "15169770729" does not make much sense (a list is not contained in a string, perhaps a list CONTAINS a string), so that expression evaluates to FALSE.

In my opinion, this behavior had to be fixed much earlier, we just kept the same until v 2.2 for backward compatibility; now we took the chance of a major release to finally fix it

Thanks

Luigi

ShakAttacked commented 6 years ago

Hi @luigidellaquila

I think it's a good change. I think I would suggest going further. If you are going to break functionality, disable IN on embedded list completely. I can still do list in ["a", "b"] and that's frankly confusing and maybe because it worked in 2.2, I never noticed Contains exists.

If you're going to break it, break it all the way. Leave IN to work only for arrays ["a", "b"] and sub-selects.

Thanks, Safwan

luigidellaquila commented 6 years ago

Hi @ShakAttacked

Unfortunately this cannot be done at parse time, because the expression is syntactically valid and because it is also semantically valid if the property is a string, and you cannot know it in general, especially when you are running schemaless.

Even if you can predict that the expression is not valid, it just means that it evaluates to false (also 1 = 'foo' will always return false, but I cannot block the user from doing it, I can just evaluate it to false).

As a conclusion, I'm afraid it will remain as it is

Thanks

Luigi