sunlightlabs / scout

Government-wide search and notification website.
https://scout.sunlightfoundation.com
Other
50 stars 30 forks source link

Fix regular expression manipulation #464

Closed jpmckinney closed 10 years ago

jpmckinney commented 10 years ago

It's because Regexp.escape escapes spaces, and we were inserting [\s\-] after the \ before the space - escaping the opening [ as a result

konklone commented 10 years ago

Thanks for IDing the problem. But the fix is weird, it's replacing all backslashes with the space/hyphen pattern, instead of replacing spaces? Even if it works, it's difficult to understand what it dos by reading it.

Could this also be fixed by putting the replacement step before the escape step?

jpmckinney commented 10 years ago

It replaces the literal sequence "\ " (there's a space after the backslash) with [\s\-], no? You can't substitute in [\s\-] before escaping, because it will escape the [\s\-]. If you want, you can substitute spaces with "REPLACE_ME_MARKER", escape the string, and then substitute "REPLACE_ME_MARKER" for [\s\-]. But I think it works as-is with this patch:

keyword = '"5 U.S.C. 552"'
keyword = keyword.gsub '"', ''
keyword = Regexp.escape(keyword)
keyword = keyword.gsub '\\ ', '[\s\-]'

returns 5[\s\-]U\.S\.C\.[\s\-]552

Without this patch:

returns 5\[\s\-]U\.S\.C\.\[\s\-]552

konklone commented 10 years ago

Whoops, I read the commit too fast, thought it was \\ rather than \\. Merging!