Closed rishabhkaushal07 closed 2 months ago
sling doesn't automatically consider backslashed quotes \" that are inside quotes when loading a csv file. For instance, consider this example: column1_data|"column2 data with \"space\""|column3_data.
column1_data|"column2 data with \"space\""|column3_data
Version: 1.2.2:
Operating System: linux
linux
Replication Configuration:
export STARROCKS='{ type: starrocks, url: "starrocks://root@<host_ip>:9030/db", fe_url: "http://<host_ip>:8030"}' ./sling run \ --src-stream file:///SFU_Fact_Screening_2019_quotes_preprocessed4.csv \ --src-options '{"format": "csv", "options": {"delimiter": "|", "header": true}}' \ --tgt-conn STARROCKS \ --tgt-object db.SFU_Fact_Screening_2019 \ --mode full-refresh \ --debug
39m12s 50,243,563 21133 r/s 32 GB | 24% MEM | 35% CPU 2024-04-01 23:32:44 DBG stream-load completed for /tmp/starrocks/db/SFU_Fact_Screening_2016_tmp/2024-04-01T225331.105/part.01.0201.csv => { "TxnId": 2067, "Label": "06486c9f-eb7f-4b32-9abb-60a928fafdab", "Status": "Success", "Message": "OK", "NumberTotalRows": 250000, "NumberLoadedRows": 250000, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 161816798, "LoadTimeMs": 1286, "BeginTxnTimeMs": 0, "StreamLoadPlanTimeMs": 0, "ReadDataTimeMs": 820, "WriteDataTimeMs": 1264, "CommitAndPublishTimeMs": 20 } 39m13s 50,264,679 21132 r/s 32 GB | 24% MEM | 34%39m14s 50,285,881 21137 r/s 32 GB | 24% MEM | 30%39m15s 50,307,326 21157 r/s 32 GB | 23% MEM | 30%39m16s 50,328,840 21180 r/s 32 GB | 23% MEM | 29%39m17s 50,350,427 21206 r/s 32 GB | 22% MEM | 31%39m18s 50,371,967 21228 r/s 32 GB | 22% MEM | 29%39m19s 50,393,599 21254 r/s 32 GB | 22% MEM | 28%39m20s 50,415,289 21282 r/s 32 GB | 22% MEM | 28%39m21s 50,436,768 21294 r/s 33 GB | 23% MEM | 42%39m22s 50,457,489 21257 r/s 33 GB | 24% MEM | 63%39m23s 50,476,640 21121 r/s 33 GB | 24% MEM | 38% CPU 2024-04-01 23:32:54 DBG loading /tmp/starrocks/db/SFU_Fact_Screening_2016_tmp/2024-04-01T225331.105/part.01.0202.csv [165 MB] ds.1712037210845.LfE-0 2024-04-01 23:32:55 DBG loading /tmp/starrocks/db/SFU_Fact_Screening_2016_tmp/2024-04-01T225331.105/part.01.0203.csv [5.3 MB] ds.1712037210845.LfE-0 39m24s 50,497,742 21374 r/s 33 GB | 24% MEM | 34% CPU 2024-04-01 23:32:55 DBG drop table if exists `db`.`SFU_Fact_Screening_2016_tmp` 2024-04-01 23:32:55 DBG table `db`.`SFU_Fact_Screening_2016_tmp` dropped 2024-04-01 23:32:55 DBG closed "starrocks" connection (conn-starrocks-nw3) 2024-04-01 23:32:55 INF execution failed Perhaps setting the delimiter (source_options.delimiter) would help? See https://docs.slingdata.io/sling-cli/run/configuration#source 2024-04-01 23:32:55 DBG stream-load completed for /tmp/starrocks/db/SFU_Fact_Screening_2016_tmp/2024-04-01T225331.105/part.01.0203.csv => { "TxnId": -1, "Label": "9215d99c-c373-4552-8676-4e5877a60de1", "Status": "Fail", "Message": "unknown table \"db.SFU_Fact_Screening_2016_tmp\"", "NumberTotalRows": 0, "NumberLoadedRows": 0, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 0, "LoadTimeMs": 0, "BeginTxnTimeMs": 0, "StreamLoadPlanTimeMs": 0, "ReadDataTimeMs": 0, "WriteDataTimeMs": 0, "CommitAndPublishTimeMs": 0 } fatal: --- sling_cli.go:418 func1 --- --- sling_cli.go:474 cliInit --- --- cli.go:284 CliProcess --- ~ failure running task (see docs @ https://docs.slingdata.io/sling-cli) --- sling_logic.go:224 processRun --- --- sling_logic.go:371 runTask --- ~ execution failed --- task_run.go:138 Execute --- --- fs.go:756 func1 --- --- datastream.go:705 func9 --- --- datastream.go:2040 next --- ~ Error reading file --- datastream.go:910 func1 --- record on line 50507895: wrong number of fields context canceled --- datastream.go:787 func9 --- ~ error during iteration --- datastream.go:700 2 --- --- datastream.go:705 func9 --- --- datastream.go:2040 next --- ~ Error reading file --- datastream.go:910 func1 --- record on line 50507895: wrong number of fields context canceled context canceled --- database_starrocks.go:430 func2 --- ~ error writing dataflow to local storage: /tmp/starrocks/db/SFU_Fact_Screening_2016_tmp/2024-04-01T225331.105 --- fs.go:795 WriteDataflowReady --- --- datastream.go:705 func9 --- --- datastream.go:2040 next --- ~ Error reading file --- datastream.go:910 func1 --- record on line 50507895: wrong number of fields context canceled --- datastream.go:787 func9 --- ~ error during iteration --- datastream.go:700 2 --- --- datastream.go:705 func9 --- --- datastream.go:2040 next --- ~ Error reading file --- datastream.go:910 func1 --- record on line 50507895: wrong number of fields context canceled --- fs.go:756 func1 --- --- datastream.go:705 func9 --- --- datastream.go:2040 next --- ~ Error reading file --- datastream.go:910 func1 --- record on line 50507895: wrong number of fields context canceled --- datastream.go:787 func9 --- ~ error during iteration --- datastream.go:700 2 --- --- datastream.go:705 func9 --- --- datastream.go:2040 next --- ~ Error reading file --- datastream.go:910 func1 --- record on line 50507895: wrong number of fields context canceled --- task_run.go:97 func1 --- ~ could not write to database --- task_run.go:387 runFileToDB --- ~ could not insert into `db`.`SFU_Fact_Screening_2016_tmp`. --- task_run_write.go:307 WriteToDb --- --- datastream.go:705 func9 --- --- datastream.go:2040 next --- ~ Error reading file --- datastream.go:910 func1 --- record on line 50507895: wrong number of fields context canceled --- datastream.go:787 func9 --- ~ error during iteration --- datastream.go:700 2 --- --- datastream.go:705 func9 --- --- datastream.go:2040 next --- ~ Error reading file --- datastream.go:910 func1 --- record on line 50507895: wrong number of fields
Thanks. Will take a look.
Done, please use 1.2.4
1.2.4
FYI You have to specify source_options.escape: "\"
source_options.escape: "\"
Issue Description
sling doesn't automatically consider backslashed quotes \" that are inside quotes when loading a csv file. For instance, consider this example:
column1_data|"column2 data with \"space\""|column3_data
.Version: 1.2.2:
Operating System:
linux
Replication Configuration: