sodadata / soda-sql

Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html
https://docs.soda.io/
Apache License 2.0
59 stars 16 forks source link

soda scan with "valid format rule" fails on mssql-databases having objects containing white-space characters #181

Open geertvanzoest opened 2 years ago

geertvanzoest commented 2 years ago

Describe the bug When targeting MS SQL Server and executing soda scan with a valid format rule on database column(s) containing white-space characters (like column name instead of column_name), Soda SQL throws an error about incorrect syntax: pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]An expression of non-boolean type specified in a context where a condition is expected, near 'No_'. (4145) (SQLExecDirectW)"). This is because the valid format rule script does not use identifiers when calling objects (comparable to issue sodadata/soda-sql#171).

Log

  | Executing SQL query: 
SELECT
  COUNT(*),
  COUNT(CASE WHEN NOT ([Phone No_] IS NULL) THEN 1 END),
  COUNT(CASE WHEN Phone No_ like '[+!0-9]%' and Phone No_ not like '%[a-z!A-z]' THEN 1 END),
  COUNT(CASE WHEN NOT ([Fax No_] IS NULL) THEN 1 END),
  COUNT(CASE WHEN Fax No_ like '[+!0-9]%' and Fax No_ not like '%[a-z!A-z]' THEN 1 END),
  COUNT(CASE WHEN NOT ([E-Mail] IS NULL) THEN 1 END),
  COUNT(CASE WHEN E-Mail LIKE '%[a-zA-Z0-9]_@[a-zA-Z0-9]%.[a-zA-Z0-9]%' THEN 1 END),
FROM [staging].[nav_r_Ship_To_Address_u_dbo]
  | Exception during aggregation query
Traceback (most recent call last):
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\sodasql\scan\scan.py", line 319, in _query_aggregations
    query_result_tuple = self.warehouse.sql_fetchone(sql)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\sodasql\scan\warehouse.py", line 32, in sql_fetchone
    return sql_fetchone(self.connection, sql)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\sodasql\scan\db.py", line 22, in sql_fetchone
    return sql_fetchone_description(connection, sql)[0]
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\sodasql\scan\db.py", line 35, in sql_fetchone_description
    cursor.execute(sql)
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]An expression of non-boolean type specified in a context where a condition is expected, near 'No_'. (4145) (SQLExecDirectW)")

To Reproduce

  1. Have a Microsoft SQL Server instance with a table with a column containing a white-space character in it's name.
  2. Configure example_scan.yml with a valid format rule (like phone_number or email) on a column with white-space character(s)
  3. Run soda scan example_warehouse.yml example_scan.yml

OS: Windows Server 2022, Windows 10 Enterprise 21H2 Python Version: 3.10.x Soda SQL Version: 2.1.1 Warehouse Type: Microsoft SQL Server 2019 (15.0.2080.9)