sodadata / soda-sql

Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html
https://docs.soda.io/
Apache License 2.0
59 stars 16 forks source link

pyodbc.DataError: Arithmetic overflow error converting expression to data type int. #173

Open geertvanzoest opened 2 years ago

geertvanzoest commented 2 years ago

Describe the bug Not sure if this is to be considered a bug in Soda SQL or simply the way SQL Server handles arithmetic operations, resulting in an error passed thru by soda scan. But whenever the result of a metric queried by Soda SQL is bigger than the data type of the column it queries, an arithmetic overflow is thrown by SQL Server. In the log snippet below, you'll find expressions AVG([contract_s]) and SUM([contract_s]) in the SELECT-clause, related to the avg- and sum-metrics as defined in the (default) Scan YAML-file. In this example, the data type of column [contract_s] is of data type integer (ranging from -2,147,483,648 to 2,147,483,647) while the actual sum of all values in [contract_s] is -184,756,112,432,485. When handling this result, SQL Server tries to convert that number into an integer data type, which doesn't fit thus resulting in an overflow error.

Log

soda scan warehouse.yml tables/h_contract.yml
  | 2.1.1
  | Scanning tables/h_contract.yml ...
  | No Soda Cloud account configured
  | Executing SQL query:
SELECT column_name, data_type, is_nullable
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'h_contract'
  | SQL took 0:00:00
  |   contract_h (binary) -> unsupported, skipped!
  |   contract_bk (nvarchar) not null
  |   contract_lt (datetime2) not null
  |   contract_s (int) not null
  | 4 columns:
  | Query measurement: schema = [{'name': 'contract_h', 'type': 'binary', 'dataType': 'binary', 'nullable': False, 'logicalType': None, 'semanticType': None}, {'name': 'contract_bk', 'type': 'nvarchar', 'dataType': 'nvarchar', 'nullable': False, 'logicalType': 'text', 'semanticType': 'text'}, {'name': 'contract_lt', 'type': 'datetime2', 'dataType': 'datetime2', 'nullable': False, 'logicalType': 'time', 'semanticType': 'time'}, {'name': 'contract_s', 'type': 'int', 'dataType': 'int', 'nullable': False, 'logicalType': 'number', 'semanticType': 'number'}]
  | Executing SQL query:
SELECT
  COUNT(*),
  COUNT(CASE WHEN NOT ([contract_bk] IS NULL) THEN 1 END),
  COUNT(CASE WHEN NOT ([contract_bk] IS NULL) THEN 1 END),
  AVG(CASE WHEN NOT ([contract_bk] IS NULL) THEN LEN([contract_bk]) END),
  MIN(CASE WHEN NOT ([contract_bk] IS NULL) THEN LEN([contract_bk]) END),
  MAX(CASE WHEN NOT ([contract_bk] IS NULL) THEN LEN([contract_bk]) END),
  COUNT(CASE WHEN NOT ([contract_lt] IS NULL) THEN 1 END),
  COUNT(CASE WHEN NOT ([contract_lt] IS NULL) THEN 1 END),
  COUNT(CASE WHEN NOT ([contract_s] IS NULL) THEN 1 END),
  COUNT(CASE WHEN NOT ([contract_s] IS NULL) THEN 1 END),
  MIN([contract_s]),
  MAX([contract_s]),
  AVG([contract_s]),
  SUM([contract_s]),
  VAR([contract_s]),
  STDEV([contract_s])
FROM [datavault].[h_contract]
  | Exception during aggregation query
Traceback (most recent call last):
  File "C:\Users\RickAstley\AppData\Local\Programs\Python\Python310\lib\site-packages\sodasql\scan\scan.py", line 319, in _query_aggregations
    query_result_tuple = self.warehouse.sql_fetchone(sql)
  File "C:\Users\RickAstley\AppData\Local\Programs\Python\Python310\lib\site-packages\sodasql\scan\warehouse.py", line 32, in sql_fetchone
    return sql_fetchone(self.connection, sql)
  File "C:\Users\RickAstley\AppData\Local\Programs\Python\Python310\lib\site-packages\sodasql\scan\db.py", line 22, in sql_fetchone
    return sql_fetchone_description(connection, sql)[0]
  File "C:\Users\RickAstley\AppData\Local\Programs\Python\Python310\lib\site-packages\sodasql\scan\db.py", line 35, in sql_fetchone_description
    cursor.execute(sql)
pyodbc.DataError: ('22003', '[22003] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Arithmetic overflow error converting expression to data type int. (8115) (SQLExecDirectW)')
  | Test error for "row_count > 0": name 'row_count' is not defined
  | Executed 1 queries in 0:00:00.008002
  | Scan summary ------
  | 1 measurements computed
  | 1 tests executed
  | 1 of 1 tests failed:
  |   Test test(row_count > 0) failed with measurements null
  | Errors occurred!
  |   [error] Exception during aggregation query
  |   [test_execution_error] Test "row_count > 0" failed
  | Exiting with code 1
  | Starting new HTTPS connection (1): collect.soda.io:443
  | https://collect.soda.io:443 "POST /v1/traces HTTP/1.1" 200 0

To Reproduce

  1. Have a Microsoft SQL Server instance containing a table with a column of data type integer. Include at least 2 records where the sum of values exceeds the integer range (-2,147,483,648 to 2,147,483,647).
  2. Configure warehouse.yml accordingly.
  3. Run soda analyze to create a default Scan YAML-file or create one that includes a sum- and/or avg-metric.
  4. Run soda scan warehouse.yml table/example_scan_file.yml.

OS: Windows Server 2022, Windows 10 Enterprise 21H2 Python Version: 3.10.x Soda SQL Version: 2.1.1 Warehouse Type: Microsoft SQL Server 2019 (15.0.2080.9)