slingdata-io / sling-cli

Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
https://docs.slingdata.io
GNU General Public License v3.0
299 stars 16 forks source link

Encountering issues while transferring data into Starrock from a CSV file. The data transfer type is stream_load. #328

Open pawan-chauhan-9560 opened 1 week ago

pawan-chauhan-9560 commented 1 week ago

Issue Description

privateIp :- FE_Url

Description of the Issue:

We created a Parquet/CSV file from SQL Server. While inserting the data into Starrocks, we are encountering an error.

SQL table structure id | productid | productName| 1 | 42 | Cookie | 2 | 43 | Ice cream, Frozen Desert|

CSV file structure: 1,42,Cookie 2,43,"Ice cream, Frozen Desert"

Sling version 1.2.11

Operating System linux

Replication Configuration:

Command Which i using to insert data into the starrocks `#!/bin/bash

Define the log file path

LOGFILE="/opt/sling/slingDataTranfserlog_19_06_24.log"

Iterate over each .csv file in the directory

for file in /opt/sling/csvnew/part.01.0460.csv; do echo "Uploading $file..." >> $LOGFILE 2>&1

Use curl to upload the file and capture the response

response=$(curl --location-trusted -u 'user:password' \ -H "Expect: 100-continue" \ -H "column_separator: ," \ -H "columns: id,product,productName" \ -H "skip_header: 1" \ -T "$file" \ -X PUT \ http://privateIP:8030/api/DatabaseName/TableName/_stream_load 2>&1)

Log the response

echo "$response" >> $LOGFILE 2>&1 done`

streams: Stream

source:  CSV file ( Created from sql server)
target: Starrocks
streams:
  ...
{
    "TxnId": 1386825,
    "Label": "37776833-c498-41e9-aa4e-2c81dec9eb33",
    "Status": "Fail",
    "Message": "too many filtered rows",
    "NumberTotalRows": 100000,
    "NumberLoadedRows": 99754,
    "NumberFilteredRows": 246,
    "NumberUnselectedRows": 0,
    "LoadBytes": 5018672,
    "LoadTimeMs": 243,
    "BeginTxnTimeMs": 1,
    "StreamLoadPlanTimeMs": 2,
    "ReadDataTimeMs": 1,
    "WriteDataTimeMs": 239,
    "CommitAndPublishTimeMs": 0,
    "ErrorURL": "http://privateIp:8040/api/_load_error_log?file=error_log_c244f8c539780e6f_8a3e7085f85dc593"
}

Error: Value count does not match column count: expected = 3, actual = 4. Column separator: ',', Row delimiter: '\n'. Row: 2,43,"Ice cream, Frozen Desert"
flarco commented 1 week ago

Hi, without a file, I cannot test. Can you produce a sample file that is erroring for you, and share it? So I can reproduce the error. You can email it to support@slingdata.io if you prefer.

pawan-chauhan-9560 commented 1 week ago

We have shared a sample dataset with [support@slingdata.io]. Please check it.