pingcap-inc / tidb2dw

Replicate data changes from TiDB to Data Warehouses in real-time
MIT License
13 stars 7 forks source link

Bigquery - Error while reading data, error message: Bad character (ASCII 0) encountered. #92

Closed eugen-korentsov closed 7 months ago

eugen-korentsov commented 9 months ago

[2024/01/19 13:57:13.866 +00:00] [ERROR] [snapshot.go:125] ["Failed to load snapshot data into data warehouse"] [table=database.table] [error="Bigquery load snapshot job completed with error: {Location: \"\"; Message: \"Error while reading data, error message: CSV processing encountered too many errors, giving up.

Error in bigquery logs: Error while reading data, error message: Bad character (ASCII 0) encountered.; line_number: 59829 byte_offset_to_start_of_line: 8544220 column_index: 1 column_name: "text" column_type: STRING value: "Parser exit code ..."

Fixed by removing bad symbols from csv:

gsutil cp gs://test-tidb2dw/$table_name-prod/snapshot/database.$table_name.00000000$range.csv - | tr -d '\000' | gsutil cp - gs://test-tidb2dw/$table_name-prod/snapshot/database.$table_name.00000000$range.csv_
gsutil mv gs://test-tidb2dw/$table_name-prod/snapshot/database.$table_name.00000000$range.csv_ gs://test-tidb2dw/$table_name-prod/snapshot/database.$table_name.00000000$range.csv

Perhaps it would be beneficial to include a feature in the tool that validates and removes any invalid characters?

eugen-korentsov commented 8 months ago
gcsRef.PreserveASCIIControlCharacters = true

in pkg/bigquerysql/operation.go helped with this issue.

Lloyd-Pottiger commented 7 months ago

fixed by a7616033618f8365cc24e1a165eaf3eae592e309