Closed epidemian closed 11 years ago
What's the specific purpose of Riddle::Query.escape? In what ways does it differ from Riddle.escape?
Riddle::Query.escape
is for SphinxQL/mysql41 queries, whereas Riddle.escape
is for Sphinx's older binary protocol.
Why does Riddle::Query.escape adds double backslashes for special SphinsQL characters? Shouldn't string.gsub(/[()|-!@~"&\/^\$=]/) { |m| "#{m}" } be enough (based on PHP's API)?
The PHP API is for the binary protocol.
Why does Riddle::Query.escape remove all backslashes from its input?
To avoid partially/badly escaped input. I'm open to discussion on this, perhaps it's not the right approach...
Why is there a separate Riddle::Client#excerpts function with its own separate specs? What's the difference between that and Riddle::Query.snippets and why does ThinkingSphinx use the latter?
Again, the different protocols - Riddle::Client#excerpts
is for generating excerpts over the binary protocol, Riddle::Query.snippets
generates a SphinxQL query to essentially do the same thing over the SphinxQL/mysql41 protocol.
With all that said, your patch is fine and I'll merge it in. Perhaps everything except MATCH strings should be escaped by default... I'm certainly open to that idea, though would be keen to see where/how it impacts Riddle/TS code.
Nice, thanks for the pointers (and the merge!), @pat; there was a whole forest i was not aware of (Sphinx' binary protocol).
Perhaps everything except MATCH strings should be escaped by default...
Maybe the MATCH strings could be SQL escaped at the Riddle level (so syntax errors and possible SQL injections are prevented) and let the user decide whether to SphinxQL-escape the query or not.
Something that is not very clear to me from reading the docs: is the expected usage of ThinkingSphinx to always use Riddle.escape
to sanitize user input, or should that be used only if one doesn't want to support extended matching syntax? I'd guess it's the later, but i'm not sure.
SomeModel.search params[:query] # Search with extended matching syntax (i.e. "", @, etc).
SomeModel.search Riddle.escape(params[:query]) # Prevent extended matching syntax magic.
In the case of our application, having things like exact phrase match with ""
is quite convenient, so we don't use Riddle.escape
on the user's input, but i'm not sure if we're letting ourselves open for injection attacks or things of that sort :stuck_out_tongue_closed_eyes:
I guess injection is now an option with the SphinxQL protocol - it certainly isn't with the old binary protocol (which I should mention is what TS v1/v2 use), because you can't update records in that manner. Nor can you with SQL-backed indexes, only real-time indices, but I do think you could be onto something with MySQL-escaping all query values.
And just to be clear: you've used Riddle.escape
in your comment, but that's for the binary protocol. You meant Riddle::Query.escape
, right? :)
The expected behaviour is developers should escape any queries where they feel users may enter in invalid query strings - but yes, then we come down to whether extended matching syntax is 'invalid' or not. To some extent that's contextual - hence the possible value in using Mysql2::Client.escape
always to be sure that injection isn't occurring.
And just to be clear: you've used Riddle.escape in your comment, but that's for the binary protocol. You meant Riddle::Query.escape, right? :)
Yes, probably (now i'm checking the implementations once again he he).
Yeah, Riddle::Query.escape
is what should be used for Sphinx > v3. But assuming that the underlying implementation of ThinkingSphinx.search
changes to do SQL-escaping on its input then the SphinxQL-escaping method provided to the user should do something more similar to Riddle.escape
, i.e. escaping things like "
or @
with a single backslash instead of two, right?
I'm not following... are you suggesting that when queries are always escaped it should use Riddle.escape
? I was thinking just Mysql2::Client.escape
.
Beyond that, I think Riddle::Query.escape
handles the rest - as SphinxQL expects @
and other such characters to be escaped with two backslashes, not one.
Hmm... i wasn't very clear it seems :sweat_smile: (consider though that i'm not very familiarized with these things... so this is more of an exploratory question than a suggestion)
are you suggesting that when queries are always escaped it should use Riddle.escape? I was thinking just Mysql2::Client.escape.
What i meant was to always use Mysql2::Client.escape
when building SQL strings from external input (in the Riddle code) and providing a function that does one-backslash escaping of SphinxQL characters (i.e. what Riddle.escape
does) to let the user decide if they want to SqphinxQL-escape their queries first (before going through the Mysql2::Client.escape
'ing in the Riddle code).
Beyond that, I think Riddle::Query.escape handles the rest - as SphinxQL expects @ and other such characters to be escaped with two backslashes, not one.
I think it's not so much that SphinxQL requires a doubly-backslashed @
. It's that when two backslashes are used in a MySQL string then MySQL will interpret that as a single backslash. So we need two backslashes in the SQL query to make SphinxQL see only one:
mysql> SELECT 'hello\@world';
+-------------+
| hello@world |
+-------------+
| hello@world |
+-------------+
1 row in set (0.00 sec)
mysql> SELECT 'hello\\@world';
+--------------+
| hello\@world |
+--------------+
| hello\@world |
+--------------+
1 row in set (0.00 sec)
The things is, if we always use Mysql2::Client.escape
when building SQL strings (e.g. somewhere under ThinkingSphinx.search
hood) then that will take care of converting single backslashes into two. '@' -> Riddle.escape -> '\@' -> Mysql2::Client.escape -> '\\@'
.
I think the confusing thing with this escaping stuff is the nesting level of escapes. In Ruby code, four backslashes in a string mean two "real" backslashes in the string itself; and when that string is interpreted as a MySQL string, MySQL will interpret it as a single backslash, which is what Sphinx will end up with (i don't know if i said it correctly xD)
Okay, I'm understanding this now - I think in my head I had Riddle::Query.escape
and Mysql2::Client.escape
around the wrong way when passing through input.
Try this on for size: d0da3fa9a0c2d1eb73e18653f329e340b6c2c053.
Wow, that commit changes things around! I think it looks good (i'll try to give it a spin at work after the weekend and see if something breaks).
Thanks!
This is more or less what i had in mind for issue #63 (and #70).
I think trying to make
Riddle::Query.escape
do both MySQL escaping and SphinxQL escaping is a bit confusing, so i used another function for escaping the snippets text, which doesn't need to be SphinxQL escaped.Now, i'm quite confused about Riddle's code at many points. Some things i'd like to understand better:
Riddle::Query.escape
? In what ways does it differ fromRiddle.escape
?Riddle::Query.escape
adds double backslashes for special SphinsQL characters? Shouldn'tstring.gsub(/[\\\(\)\|\-!@~"&\/\^\$=]/) { |m| "\\#{m}" }
be enough (based on PHP's API)?Riddle::Query.escape
remove all backslashes from its input?Riddle::Client#excerpts
function with its own separate specs? What's the difference between that andRiddle::Query.snippets
and why does ThinkingSphinx use the later?So, i hope it's clear that i don't have a good grasp of Riddle's inner workings. That being said, shouldn't Riddle escape all strings that come "from the outside" before using them as string is SQL queries? (both to avoid SQL syntax erros and potential SQL injections) And also apply SphinxQL escaping only to the strings that will be used in
MATCH
statements before the SQL escaping, or maybe leave that to the user so they can decide if they want extended matching or not (note that if youMysql2::Client.escape(Riddle.escape('@'))
then you get that double backslash that allows Sphinx to read the@
as a literal).