ibis-project / ibis-bigquery

BigQuery backend for Ibis
Apache License 2.0
19 stars 18 forks source link

fix: escape regex literals in `regexp_contains` #112

Closed jayceslesar closed 2 years ago

jayceslesar commented 2 years ago

@cpcloud @tswast Thoughts?

Fixes issue where regexes passed into re_search among other re functions in the ibis library were being converted to strings prefixed with r'my_original_string' which seems less than useful if 'my_original_string' is a regex. The string I am trying to filter on looks like some_message_((&some_signal)) Before generated BigQuery would look like:

SELECT *
FROM `my_project.my_dataset.my_table`
WHERE REGEXP_CONTAINS(`field`, r'(some_message_\\(\\(&some_signal\\)\\))')

whereas now it looks like:

SELECT *
FROM `my_project.my_dataset.my_table`
WHERE REGEXP_CONTAINS(`field`, '(some_message_\\(\\(&some_signal\\)\\))')

Will close #110

jayceslesar commented 2 years ago

Sorry for double PR (closed the first one) trying to conform to the conventionalcommits...testing my beginner/medium git

jayceslesar commented 2 years ago

@tswast let me know how that test looks, little challenging to conduct a test without a real table with values (probably possible just unsure of syntax). Would be open to contributing docs to this entire package really as I am going to be writing them up for my org (more specific examples but can de proprietize and submit)

The fix submitted does work for me, as in when I pass strings into the re_search I get back what I am expecting, where as before I was getting empty tables as the string was being treated as a raw string.

What was weird though (before I submitted this fix), is even if I passed in the complete match of a string I was looking for in some BQ table to re_search, it still didn't work.

Edit: Maybe this is better to use a config value for? Let the user explicitly decide?

Yeah on a second glance seems like having some boolean in the config would be the best way to change this as to not break existing functionality and allow the user to explicitly decide whether they want the compiled BQ query strings to look like r"my_string" or "my_string"

jayceslesar commented 2 years ago

Was able to get around this by building uglier regexes so not needed!