tidyverse / dbplyr

Database (DBI) backend for dplyr
https://dbplyr.tidyverse.org
Other
469 stars 169 forks source link

overflow error when counting very large tables in sql server #1498

Open edward-burn opened 1 month ago

edward-burn commented 1 month ago

In sql server with a very large table with 6 billion rows the below error occurs because COUNT is being used rather than COUNT_BIG. I was able to fix this by changing the translation to COUNT_BIG https://github.com/tidyverse/dbplyr/compare/main...oxford-pharmacoepi:dbplyr:sql_server_count_big, but I'm not sure @hadley @mgirlich what you would think about switching to using this in dbplyr for sql server? The nice thing with using COUNT_BIG is that it should always work, but it does return a bigint data type rather than int as now https://learn.microsoft.com/en-us/sql/t-sql/functions/count-big-transact-sql?view=sql-server-ver16

Error in dplyr::collect():
! Failed to collect lazy table.
Caused by error:
! nanodbc/nanodbc.cpp:1771: 22003
[Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Arithmetic overflow error converting expression to data type int.
'SELECT
COUNT() AS "number_records-count",
COUNT(DISTINCT "person_id") AS "number_subjects-count"
FROM "CDM"."visit_occurrence"'
Run rlang::last_trace() to see where the error occurred.
rlang::last_trace()
<error/rlang_error>
Error in dplyr::collect():
! Failed to collect lazy table.
Caused by error:
! nanodbc/nanodbc.cpp:1771: 22003
[Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Arithmetic overflow error converting expression to data type int.
'SELECT
COUNT() AS "number_records-count",
COUNT(DISTINCT "person_id") AS "number_subjects-count"
FROM "CDM"."visit_occurrence"'