slingdata-io / sling-cli

Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
https://docs.slingdata.io
GNU General Public License v3.0
422 stars 32 forks source link

[bug][clickhouse] column name conflict after casting #417

Open yokofly opened 3 hours ago

yokofly commented 3 hours ago

Issue Description

Query id: 2fcd3457-fecc-4243-9dfd-93e78b4af3e8

┌─statement─────────────────────────────────────────────────────────────┐

  1. │ CREATE TABLE default.vv ( id Int32, Id Int32 ) ENGINE = Memory │ └───────────────────────────────────────────────────────────────────────┘

1 row in set. Elapsed: 0.002 sec.

and wanna use sling to sync for another table, but find the column naming conflict.

./sling run --src-conn CLICKHOUSE --src-stream 'vv' --tgt-conn CLICKHOUSE --tgt-object "vv2" --mode full-refresh -d


- Sling version (`sling --version`): 
wget https://github.com/slingdata-io/sling-cli/releases/download/v1.2.21/sling_linux_amd64.tar.gz

- Operating System (`linux`, `mac`, `windows`): 
linux

(base) ➜ ck export CLICKHOUSE='clickhouse://default@localhost:9000/default'
(base) ➜ ck ./sling run --src-conn CLICKHOUSE --src-stream 'vv' --tgt-conn CLICKHOUSE --tgt-object "vv2" --mode full-refresh -d 2024-10-23 07:18:21 DBG Sling version: 1.2.21 (linux amd64) 2024-10-23 07:18:21 DBG type is db-db 2024-10-23 07:18:21 DBG using: {"columns":null,"mode":"full-refresh","transforms":null} 2024-10-23 07:18:21 DBG using source options: {"empty_as_null":false,"null_if":"NULL","datetime_format":"AUTO","max_decimals":11} 2024-10-23 07:18:21 DBG using target options: {"batch_limit":100000,"datetime_format":"auto","file_max_rows":0,"max_decimals":11,"use_bulk":true,"add_new_columns":true,"adjust_column_type":false,"column_casing":"source"} 2024-10-23 07:18:21 DBG opened "clickhouse" connection (conn-clickhouse-mlD) 2024-10-23 07:18:21 DBG opened "clickhouse" connection (conn-clickhouse-dFr) 2024-10-23 07:18:21 INF connecting to source database (clickhouse) 2024-10-23 07:18:21 INF connecting to target database (clickhouse) 2024-10-23 07:18:21 INF reading from source database 2024-10-23 07:18:21 DBG select * from default.vv 2024-10-23 07:18:21 INF writing to target database [mode: full-refresh] 2024-10-23 07:18:21 DBG drop table if exists default.vv2_tmp 2024-10-23 07:18:21 DBG table default.vv2_tmp dropped 2024-10-23 07:18:21 DBG create table default.vv2_tmp (id Nullable(Int64), Id Nullable(Int64)) engine=MergeTree ORDER BY tuple() 2024-10-23 07:18:21 INF created table default.vv2_tmp 2024-10-23 07:18:21 INF streaming data 2024-10-23 07:18:21 DBG use default 2024-10-23 07:18:21 DBG drop table if exists default.vv2_tmp 2024-10-23 07:18:21 DBG table default.vv2_tmp dropped 2024-10-23 07:18:21 DBG closed "clickhouse" connection (conn-clickhouse-dFr) 2024-10-23 07:18:21 DBG closed "clickhouse" connection (conn-clickhouse-mlD) 2024-10-23 07:18:21 INF execution failed fatal: --- proc.go:271 main --- --- sling_cli.go:458 main --- --- sling_cli.go:494 cliInit --- --- cli.go:286 CliProcess --- --- sling_run.go:225 processRun --- ~ failure running task (see docs @ https://docs.slingdata.io/sling-cli) --- sling_run.go:396 runTask --- --- task_run.go:155 Execute ---

--- task_run.go:116 func2 --- --- task_run.go:559 runDbToDb --- --- task_run_write.go:234 WriteToDb --- --- database.go:2313 BulkImportFlow --- ~ could not bulk import --- database.go:2300 func1 --- ~ could not copy data --- database_clickhouse.go:261 BulkImportStream --- ~ could not prepare statement --- database_clickhouse.go:171 func2 --- ~ could not prepare statement --- database.go:1062 Prepare --- ~ could not prepare Tx: insert into default.vv2_tmp (Id, Id) values ($1, $2) --- transaction.go:95 Prepare --- code: 15, message: Column Id in table default.vv2_tmp (44a2af7b-757a-4ec1-a543-ff3bd745ce0a) specified more than once

--- task_run.go:116 func2 --- ~ Could not WriteToDb --- task_run.go:559 runDbToDb --- ~ could not insert into default.vv2_tmp --- task_run_write.go:240 WriteToDb --- (base) ➜ ck ./clickhouse client -q "show create vv" CREATE TABLE default.vv\n(\n id Int32,\n Id Int32\n)\nENGINE = Memory

yokofly commented 2 hours ago

@flarco any comments? i do not know the context why we need cast to lower or cast to upper for some databases.

flarco commented 1 hour ago

@yokofly I need to look into it. Column names have cause issues in the past, especially when sourcing from files. But slinging from db to do should keep original name, agreed.