ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.22k stars 592 forks source link

feat: remove sqlglot `REGEXP_REPLACE` warnings #7386

Closed NickCrews closed 1 year ago

NickCrews commented 1 year ago

Is your feature request related to a problem?

from #7346

repro:

import ibis
e = ibis.literal("aaa").re_replace("a", "b")
ibis.show_sql(e)

gives:

REGEXP_REPLACE does not support the following arg(s): ['position']
SELECT
  REGEXP_REPLACE('aaa', 'a', 'b') AS "RegexReplace('aaa', 'a', 'b')"

Describe the solution you'd like

no warnings are generated

What version of ibis are you running?

main

What backend(s) are you using, if any?

duckdb

Code of Conduct

cpcloud commented 1 year ago

What ibis APIs are you using that generate these warnings?

NickCrews commented 1 year ago

sorry I was too slow I just added a repro above

cpcloud commented 1 year ago

Weird, this isn't even a proper warning, it looks like it could be a stray print 👀 👀

NickCrews commented 1 year ago

yeah I grepped for that in sqlglot and I had troulbe finding the actual line that prints/causes it

cpcloud commented 1 year ago

I think we're actually generating an incorrect extra argument, though lemme do some git spelunking because I swear we had a bug report about this.

cpcloud commented 1 year ago

Oh, actually, the g is necessary, and it's not documented in the pattern matching section of the duckdb docs (just the text functions) 😞

cpcloud commented 1 year ago

Gah, it looks like the argument is swallowed by sqlglot

NickCrews commented 1 year ago

🕵️ 🐰 🕳️

cpcloud commented 1 year ago

Fixed by upgrading your version of sqlglot to at least 18.16.1 (fixed in https://github.com/tobymao/sqlglot/commit/87efe41839a4f8e14e9b5dc0810156d06ae053a7 specifically):

In [1]: import ibis

In [2]: e = ibis.literal("aaa").re_replace("a", "b")

In [3]: ibis.show_sql(e)
SELECT
  REGEXP_REPLACE('aaa', 'a', 'b', 'g') AS "RegexReplace('aaa', 'a', 'b')"

In [4]: import sqlglot as sg

In [5]: sg.__version__
Out[5]: '18.16.1'
NickCrews commented 1 year ago

@cpcloud thanks for investigating this!

I read that sqlglot issue and am a little confused though. That made it sound like duckdb was never seeing the 'g' flag, which would make me think that previously we shouldn't have been getting a global replace. But it looks like in our tests we were getting global replaces? Am I misunderstanding something here? Are our tests incomplete?

cpcloud commented 1 year ago

We're only generating code with sqlglot in the to_sql/show_sql functions, the code that gets sent to the backend is a sqlalchemy object, and we define our re_replace operation with g in the duckdb registry.