slingdata-io / sling-cli

Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
https://docs.slingdata.io
GNU General Public License v3.0
398 stars 27 forks source link

SQL Server replication fails if table names are not quoted when case sensitivity is enabled #260

Closed MiConnell closed 5 months ago

MiConnell commented 5 months ago

Issue Description

Our SQL Server source database has case sensitivity enabled, meaning table names are case-sensitive. Running the below errors out (logs in the log output section below)

sling run -d --src-conn MSSQL --src-stream 'dbo.TABLE' --tgt-conn REDSHIFT --tgt-object 'schema.table' --mode full-refresh

I will note this does work if I quote the table '"dbo"."TABLE"'

confirmed with

SELECT count(*)
  FROM dbo.table
-- Msg 208, Level 16, State 1, Line 1 Invalid object name 'dbo.table'.

SELECT count(*)
  FROM dbo.TABLE
-- 5000
source: MSSQL
target: REDSHIFT

defaults:
  object: 'schema.{stream_table}'
  mode: full-refresh

streams:
  '"dbo"."TABLE"':  # works
  dbo.TABLE:  # does not work
2024-04-12 23:03:55 DBG Sling version: 1.2.3 (linux amd64)
2024-04-12 23:03:55 DBG type is db-db
2024-04-12 23:03:55 DBG using source options: {"empty_as_null":true,"null_if":"NULL","datetime_format":"AUTO","max_decimals":-1}
2024-04-12 23:03:55 DBG using target options: {"datetime_format":"auto","max_decimals":-1,"use_bulk":true,"add_new_columns":true,"adjust_column_type":false,"column_casing":"source"}
2024-04-12 23:03:55 INF connecting to source database (sqlserver)
2024-04-12 23:03:55 DBG opened "sqlserver" connection (conn-sqlserver)
2024-04-12 23:03:55 INF connecting to target database (redshift)
2024-04-12 23:03:55 DBG opened "redshift" connection (conn-redshift)
2024-04-12 23:03:55 INF reading from source database
2024-04-12 23:03:55 DBG closed "sqlserver" connection (conn-sqlserver)
2024-04-12 23:03:55 DBG closed "redshift" connection (conn-redshift)
2024-04-12 23:03:55 INF execution failed
fatal:
--- task_run.go:99 func1 ---
~ failure running task (see docs @ https://docs.slingdata.io/sling-cli)
--- task_run.go:504 runDbToDb ---
--- task_run_read.go:61 ReadFromDB ---
~ execution failed
--- database.go:1405 GetSQLColumns ---
~ Could not ReadFromDB
--- database.go:1539 GetColumns ---
~ Could not get source columns
--- database.go:1470 GetTableColumns ---
did not find any columns for "dbo"."table". Perhaps it does not exists, or user does not have read permission.

It looks like this stems from ParseTableName - is there a reason that the names are converted to lowercase? I understand only mixed-case or special-character names are not transformed, and Oracle and Snowflake are uppercased (which makes sense) but why not query the objects as they're configured?

flarco commented 5 months ago

@MiConnell yes, you've hit the nail on the head. It's a good question, it was done this way because of queries most often containing lowercase table name (unqualified), and the native parser auto-matches to the correct object. Sling has to do that in order to correctly pull column info from INFORMATION_SCHEMA.

But you're not the first to raise this issue for SQL Server. So I think it's probably best to make that change, that if the table name is mixed-case, special-character or upper, then leave as it.

Will change to below:

defCaseAsIs := hasUpper || hasSpecial

Thanks for raising.

flarco commented 5 months ago

Done: https://github.com/slingdata-io/sling-cli/pull/253/commits/b7082dab015a7e94ac22470fd2ff03ddf30731b7 Watch for 1.2.4. Closing.