stevenvachon / sql-match

Match a string using an SQL pattern.
MIT License
2 stars 0 forks source link

Use regex for simpler tokenization #1

Closed charmander closed 6 years ago

charmander commented 6 years ago

Has performance benefits, treating consecutive normal characters as a single token.

I haven’t updated browser.js.

stevenvachon commented 6 years ago

Thanks! Please rebase with the current changes.

What should we do with the trailing escape character as it may only be a Postgres feature? How does MySQL handle it and what does the SQL spec say?

charmander commented 6 years ago

What should we do with the trailing escape character as it may only be a Postgres feature? How does MySQL handle it and what does the SQL spec say?

I’m not sure, but this is compatible with 1.0.1!

coveralls commented 6 years ago

Coverage Status

Coverage remained the same at 100.0% when pulling e9bfe51862eede5f75cb93d5b58bf1c3744ec0ed on charmander:regexp-tokenize into 1d11feee96df0702571ee304d5fe4e702e341e93 on stevenvachon:master.

stevenvachon commented 6 years ago

Thank you for rebasing. This performs consistently two times faster!

Would you show me what the equivalent SQL SELECT would look like? What I'm envisioning is an escaped closing quote, which would make the entire statement invalid.

charmander commented 6 years ago

SQL:

SELECT 'ax' LIKE 'b\';

MySQL without NO_BACKSLASH_ESCAPES:

SELECT 'ax' LIKE 'b\\';
stevenvachon commented 6 years ago
SELECT 'ax' LIKE 'ax\\';

PostgreSQL:

?column? 
----------
 f
(1 row)

MySQL:

+------------------+
| 'ax' LIKE 'ax\\' |
+------------------+
|                0 |
+------------------+
1 row in set (0.00 sec)
charmander commented 6 years ago

The correct test for PostgreSQL is 'ax\', not 'ax\\', and the left side needs to have the potential to match (in this case, be at least three characters long) or it won’t produce an error.

stevenvachon commented 6 years ago

It would appear that PostgreSQL behaves like MySQL with NO_BACKSLASH_ESCAPES while this library (being written in JavaScript) behaves like MySQL without NO_BACKSLASH_ESCAPES. I'm not sure if this needs to be documented or tested, though. PostgreSQL never throws an error.


SELECT 'ax\' LIKE 'ax\\';

PostgreSQL:

 ?column? 
----------
 t
(1 row)

MySQL with NO_BACKSLASH_ESCAPES:

Query OK, 0 rows affected, 1 warning (0.00 sec)

+-------------------+
| 'ax\' LIKE 'ax\\' |
+-------------------+
|                 1 |
+-------------------+
1 row in set (0.00 sec)

MySQL without NO_BACKSLASH_ESCAPES:

ERROR: 
Unknown command '\\'.

SELECT 'ax\\' LIKE 'ax\\';

PostgreSQL:

 ?column? 
----------
 f
(1 row)

MySQL with NO_BACKSLASH_ESCAPES:

Query OK, 0 rows affected, 1 warning (0.00 sec)

+--------------------+
| 'ax\\' LIKE 'ax\\' |
+--------------------+
|                  0 |
+--------------------+
1 row in set (0.00 sec)

MySQL without NO_BACKSLASH_ESCAPES:

+--------------------+
| 'ax\\' LIKE 'ax\\' |
+--------------------+
|                  1 |
+--------------------+
1 row in set (0.00 sec)
charmander commented 6 years ago

As I mentioned before, this is what you should be testing in PostgreSQL:

SELECT 'abc' LIKE 'ab\';

It will produce an error. The equivalent MySQL query is SELECT 'abc' LIKE 'ab\\';. Whether backslash escapes are enabled makes no difference to this test case – backslash escapes affect string literals, and the pattern is just a string value.

So, the question as it pertains only to this library: should a backslash at the end of a pattern:

stevenvachon commented 6 years ago

That query is escaping the closing quote, which is not [always] a valid query.


SELECT 'abc' LIKE 'ab\';

PostgreSQL

ERROR:  LIKE pattern must not end with escape character

MySQL without NO_BACKSLASH_ESCAPES

Sees it as an incomplete query:

mysql> SELECT 'abc' LIKE 'ab\';
    '>

MySQL with NO_BACKSLASH_ESCAPES

Query OK, 0 rows affected, 1 warning (0.00 sec)

+------------------+
| 'abc' LIKE 'ab\' |
+------------------+
|                0 |
+------------------+
1 row in set (0.00 sec)

SELECT 'abc' LIKE 'ab\\';

PostgreSQL

 ?column? 
----------
 f
(1 row)

MySQL without NO_BACKSLASH_ESCAPES

+-------------------+
| 'abc' LIKE 'ab\\' |
+-------------------+
|                 0 |
+-------------------+
1 row in set (0.00 sec)

MySQL with NO_BACKSLASH_ESCAPES

Query OK, 0 rows affected, 1 warning (0.00 sec)

+-------------------+
| 'abc' LIKE 'ab\\' |
+-------------------+
|                 0 |
+-------------------+
1 row in set (0.00 sec)
charmander commented 6 years ago

That query is escaping the closing quote

It isn’t. Standard SQL single-quoted strings don’t have escape characters. The only character that needs to be escaped is the single quote itself, and that’s done as ''.

=> SELECT '\';
 ?column? 
----------
 \
(1 row)

=> SELECT 'a single quote -> '' <- that is escaped';
                ?column?                
----------------------------------------
 a single quote -> ' <- that is escaped
(1 row)

The string literal 'abc\' represents the pattern abc\, and this is an invalid pattern in PostgreSQL because the escape character (for patterns, not string literals) doesn’t escape anything.

stevenvachon commented 6 years ago

Is MySQL not spec-compliant, then?:

SELECT 'abc' LIKE 'ab\';';
+--------------------+
| 'abc' LIKE 'ab\';' |
+--------------------+
|                  0 |
+--------------------+
1 row in set (0.00 sec)
charmander commented 6 years ago

@stevenvachon Right – MySQL is not spec-compliant in that respect unless you enable NO_BACKSLASH_ESCAPES.

stevenvachon commented 6 years ago

Okay, let's go with spec compliance then, with a "Pattern must not end with escape character" error thrown.

You'll need to commit an updated browser.js file for CI to pass, please.

charmander commented 6 years ago

@stevenvachon All the spec compliance discussion so far has been about the string literals. String literals are irrelevant to this problem. I don’t know what the SQL standard has to say about escape characters at the end of LIKE patterns because I’d have to buy it to find out. The current behaviour is probably fine – not throwing on any pattern when only one rare type of pattern could be considered incorrect is helpful, and follows MySQL.

I’m unable to update browser.js at this time. Could you?

stevenvachon commented 6 years ago

If no update is made to the trailing escape character, then browser.js doesn't need to be built in this PR. If you're fine with it as is, then I'm fine with it as well. I thought that I was the only one who'd run into the specification purchase issue -- what a silly situation.