sodadata / soda-sql

Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html
https://docs.soda.io/
Apache License 2.0
61 stars 16 forks source link

REGEX use on a collated VARCHAR column in Snowflake causes error #201

Open scott-fought opened 2 years ago

scott-fought commented 2 years ago

Describe the bug If a Snowflake VARCHAR column is defined with collation, REGEX functions cause an error.

To Reproduce Steps to reproduce the behavior:

  1. Create a VARCHAR column in a Snowflake table
  2. Run soda analyze ...
  3. Or write scan file with valid_regex entry for column
  4. Run soda scan ... using that scan file

Context Snowflake does not support REGEX on collated columns. Collation can be removed from a column by wrapping the expression in, COLLATE({expr}, '')

OS: Mac OS Big Sur version 11.6 Python Version: Python 3.9.10 Soda SQL Version: 2.1.2 Warehouse Type: Snowflake

scott-fought commented 2 years ago

Fix is in the branch 665-snowflake-collation. Though this removes collation from all VARCHAR columns, collated or not. While this is benign, it is unnecessary. Would it be more appropriate to have a conditional or some other way to select this feature?