pat / riddle

Ruby Client API for Sphinx
MIT License
135 stars 67 forks source link

Riddle::Query.escape misses some characters caught by Sphinx's EscapeString #86

Closed benweint closed 10 years ago

benweint commented 10 years ago

It looks like this has been the case for a long time, so maybe it's intentional, but it appears that Riddle's escaping method misses some characters that have special meaning within the context of a Sphinx query.

The PHP API client that ships with Sphinx implements EscapeString as follows:

function EscapeString ( $string )
{
  $from = array ( '\\', '(',')','|','-','!','@','~','"','&', '/', '^', '$', '=', '<' );
  $to   = array ( '\\\\', '\(','\)','\|','\-','\!','\@','\~','\"', '\&', '\/', '\^', '\$', '\=', '\<' );
  return str_replace ( $from, $to, $string );
}

Riddle::Query.escape looks like this:

def self.escape(string)
  string.gsub(/[\(\)\|\-!@~\/"\/\^\$\\]/) { |match| "\\#{match}" }
end

Note that &, =, and < are all escaped by the official PHP version, but not by Riddle's version. The extended query syntax documentation for Sphinx indicates that = may be used as an exact form modifier, and < is used as part of the strict order operator. I'm not sure how & is used, since it doesn't seem to be referenced on that page.

Is the omission of these characters from the escaping logic intentional, or does it represent a bug?

pat commented 10 years ago

Hi Ben

The difference is certainly intentional - the PHP API client is for the binary protocol, just like Riddle.escape. Riddle::Query.escape is meant for the SphinxQL protocol. They were separated in 17806941e74a8f73e901731f6eaa30f47f72320d (and the latter has morphed through a few iterations over the last several months) though I can't remember the exact reasoning behind why they're different.

benweint commented 10 years ago

Thanks for the explanation, Pat!

While that does explain the single vs. double-escaping difference (which I hadn't even noticed in my initial read), that's actually not the difference I was asking about.

The specific difference I'm interested in is why neither Riddle.escape nor Riddle::Query.escape seem to escape the <, or = characters, since they both seem to have special meaning within SphinxQL according to the documentation, and both are escaped by the official API client's EscapeString method.

pat commented 10 years ago

I guess they weren't in the PHP implementation when I borrowed its list of escaped characters. A pull request is definitely welcome to ensure they're covered in both escape methods - otherwise I'll get to it soon :)

pat commented 10 years ago

Added & and = to the escaped characters list (< was already in it). See 55295348c419ff844bd3219e75b7fda556dbc84d.

benweint commented 10 years ago

Awesome, thanks!