sdf-labs / sql-functions

8 stars 1 forks source link

regexp_replace() is broken #50

Closed wizardxz closed 3 months ago

wizardxz commented 5 months ago

Describe the bug regexp_replace() logic is wrong

To Reproduce https://github.com/sdf-labs/sdf/blob/main/crates/sdf-cli/workspaces/presto_functions/src/main.sql#L475

SELECT regexp_replace('1a 2b 14m', '(\d+)([ab]) ', '3c$2 ')
----
3ca 2b 14m

Expected behavior

trino> SELECT regexp_replace('1a 2b 14m', '(\d+)([ab]) ', '3c$2 ')
    -> ;
    _col0    
-------------
 3ca 3cb 14m 
(1 row)

Additional context datafusion regexp_replace() has different behavior with trino too

> SELECT regexp_replace('1a 2b 14m', '(\\d+)([ab]) ', '3c$2 ');
+----------------------------------------------------------------------+
| regexp_replace(Utf8("1a 2b 14m"),Utf8("(\d+)([ab]) "),Utf8("3c$2 ")) |
+----------------------------------------------------------------------+
| 3ca 2b 14m                                                           |
+----------------------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.027 seconds.