pingcap-inc / tidb2dw

Replicate data changes from TiDB to Data Warehouses in real-time
MIT License
13 stars 7 forks source link

tidb2dw > BigQuery error #84

Closed eugen-korentsov closed 9 months ago

eugen-korentsov commented 9 months ago

Hi, trying to use it with bigquery, but get this error:

[2024/01/10 17:15:40.167 +00:00] [ERROR] [snapshot.go:150] ["Failed to load snapshot"] [table=test.test] [error="Failed to create table: googleapi: Error 400: Syntax error: Expected \")\" or \",\" but got keyword DEFAULT at [3:30], invalidQuery"] [errorVerbose="googleapi: Error 400: Syntax error: Expected \")\" or \",\" but got keyword DEFAULT at [3:30], invalidQuery\ngithub.com/pingcap/errors.AddStack\n\t/home/runner/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20221009092201-b66cddb77c32/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/home/runner/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20221009092201-b66cddb77c32/juju_adaptor.go:15\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.runQuery\n\t/home/runner/work/tidb2dw/tidb2dw/pkg/bigquerysql/operation.go:17\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.(*BigQueryConnector).CopyTableSchema\n\t/home/runner/work/tidb2dw/tidb2dw/pkg/bigquerysql/connector.go:99\ngithub.com/pingcap-inc/tidb2dw/replicate.(*SnapshotReplicateSession).Run\n\t/home/runner/work/tidb2dw/tidb2dw/replicate/snapshot.go:83\ngithub.com/pingcap-inc/tidb2dw/replicate.StartReplicateSnapshot\n\t/home/runner/work/tidb2dw/tidb2dw/replicate/snapshot.go:149\ngithub.com/pingcap-inc/tidb2dw/cmd.Replicate.func1\n\t/home/runner/work/tidb2dw/tidb2dw/cmd/core.go:243\nruntime.goexit\n\t/opt/hostedtoolcache/go/1.21.1/x64/src/runtime/asm_amd64.s:1650\nFailed to create table"] [tableFQN=test.test]

Tidb version 7.5.0 I've checked CDC, changefeed is created:

./cdc cli changefeed list --server=http://127.0.0.1:8301
[
  {
    "id": "1d11d811-57dd-441d-8c22-143fd70b7f54",
    "namespace": "default",
    "summary": {
      "state": "normal",
      "tso": 446931174826442767,
      "checkpoint": "2024-01-10 17:18:50.533",
      "error": null
    }
  },

I can see data in the Google Cloud storage bucket, the dataset exists. Any idea?

eugen-korentsov commented 9 months ago

How I launch it: ./tidb2dw-v0.0.3-linux-amd64 bigquery --storage gs://some-bucket/ --table test.test --credentials-file-path cred.json --bq.project-id test-test --bq.dataset-id tidb_prod --cdc.host ticdc-peer.tidb.svc.cluster.local --cdc.port 8301 --tidb.host tidb.tidb.svc.cluster.local --tidb.user tidb2dw --tidb.pass ***

eugen-korentsov commented 9 months ago

I've built latest version from main branch, it has more detailed logs:

[2024/01/11 16:24:23.167 +00:00] [WARN] [gcs.go:379] ["failed to list objects on gcs, will use default value forprefix"] [error="Get \"https://storage.googleapis.com/storage/v1/b/test-tidb2dw/o?alt=json&delimiter=&endOffset=&fields=nextPageToken%2Cprefixes%2Citems%28name%29&includeTrailingDelimiter=false&pageToken=&prefix=&prettyPrint=false&projection=full&startOffset=&versions=false\": oauth2: cannot fetch token: 400 Bad Request\nResponse: {\"error\":\"invalid_scope\",\"error_description\":\"Invalid OAuth scope or ID token audience provided.\"}"] [2024/01/11 16:24:23.172 +00:00] [WARN] [gcs.go:379] ["failed to list objects on gcs, will use default value forprefix"] [error="Get \"https://storage.googleapis.com/storage/v1/b/test-tidb2dw/o?alt=json&delimiter=&endOffset=&fields=nextPageToken%2Cprefixes%2Citems%28name%29&includeTrailingDelimiter=false&pageToken=&prefix=%2F%2F&prettyPrint=false&projection=full&startOffset=&versions=false\": oauth2: cannot fetch token: 400 Bad Request\nResponse: {\"error\":\"invalid_scope\",\"error_description\":\"Invalid OAuth scope or ID token audience provided.\"}"] [2024/01/11 16:24:23.178 +00:00] [ERROR] [bigquery.go:109] ["Fatal error running bigquery replication"] [error="[DFLOW:ErrFailToCreateExternalStorage]failed to create external storage%!(EXTRA string=creating ExternalStorage for s3): Get \"https://storage.googleapis.com/storage/v1/b/test-tidb2dw/o/test?alt=json&prettyPrint=false&projection=full\": oauth2: cannot fetch token: 400 Bad Request\nResponse: {\"error\":\"invalid_scope\",\"error_description\":\"Invalid OAuth scope or ID token audience provided.\"}"] [errorVerbose="[DFLOW:ErrFailToCreateExternalStorage]failed to create external storage%!(EXTRA string=creating ExternalStorage for s3): Get \"https://storage.googleapis.com/storage/v1/b/test-tidb2dw/o/test?alt=json&prettyPrint=false&projection=full\": oauth2: cannot fetch token: 400 Bad Request\nResponse: {\"error\":\"invalid_scope\",\"error_description\":\"Invalid OAuth scope or ID token audience provided.\"}\ngithub.com/pingcap/errors.AddStack\n\t/home/eugene/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20221009092201-b66cddb77c32/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\t/home/eugene/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20221009092201-b66cddb77c32/normalize.go:164\ngithub.com/pingcap/tiflow/pkg/util.GetExternalStorage\n\t/home/eugene/go/pkg/mod/github.com/pingcap/tiflow@v0.0.0-20231214143919-c6b3fec5c91e/pkg/util/external_storage.go:84\ngithub.com/pingcap/tiflow/pkg/util.GetExternalStorageFromURI\n\t/home/eugene/go/pkg/mod/github.com/pingcap/tiflow@v0.0.0-20231214143919-c6b3fec5c91e/pkg/util/external_storage.go:42\ngithub.com/pingcap-inc/tidb2dw/cmd.Export\n\t/home/eugene/work/test/tidb2dw/cmd/core.go:180\ngithub.com/pingcap-inc/tidb2dw/cmd.Replicate\n\t/home/eugene/work/test/tidb2dw/cmd/core.go:242\ngithub.com/pingcap-inc/tidb2dw/cmd.NewBigQueryCmd.func1\n\t/home/eugene/work/test/tidb2dw/cmd/bigquery.go:95\ngithub.com/pingcap-inc/tidb2dw/cmd.NewBigQueryCmd.func2.1\n\t/home/eugene/work/test/tidb2dw/cmd/bigquery.go:107\ngithub.com/pingcap-inc/tidb2dw/cmd.runWithServer\n\t/home/eugene/work/test/tidb2dw/cmd/core.go:280\ngithub.com/pingcap-inc/tidb2dw/cmd.NewBigQueryCmd.func2\n\t/home/eugene/work/test/tidb2dw/cmd/bigquery.go:106\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/eugene/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:944\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/eugene/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/eugene/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992\nmain.main\n\t/home/eugene/work/test/tidb2dw/main.go:51\nruntime.main\n\t/usr/lib/go-1.21/src/runtime/proc.go:267\nruntime.goexit\n\t/usr/lib/go-1.21/src/runtime/asm_amd64.s:1650"]

json key has full permissions to this bucket, I am trying to launching it from k8s pod in GKE. Any idea ?

eugen-korentsov commented 9 months ago

Ok, initial issue related to https://github.com/pingcap-inc/tidb2dw/pull/80, and it is fixed. Second with oauth2: cannot fetch token:, was fixed by export GOOGLE_APPLICATION_CREDENTIALS=/credentials.json and without adding --credentials-file-path

But it still doesn't work: [2024/01/15 15:45:45.544 +00:00] [ERROR] [snapshot.go:125] ["Failed to load snapshot data into data warehouse"] [table=test.locations] [error="Bigquery load snapshot job completed with error: {Location: \"gs://test-tidb2dw/loc2/snapshot/test.locations.000000000.csv\"; Message: \"Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 71; errors: 100. Please look into the errors[] collection for more details. File: gs://test-tidb2dw/loc2/snapshot/test.locations.000000000.csv\"; Reason: \"invalid\"}"] [errorVerbose="Bigquery load snapshot job completed with error: {Location: \"gs://test-tidb2dw/loc2/snapshot/test.locations.000000000.csv\"; Message: \"Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 71; errors: 100. Please look into the errors[] collection for more details. File: gs://test-tidb2dw/loc2/snapshot/test.locations.000000000.csv\"; Reason: \"invalid\"}\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.loadGCSFileToBigQuery\n\t/home/test/work/test/tidb2dw/pkg/bigquerysql/operation.go:51\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.(*BigQueryConnector).LoadSnapshot\n\t/home/test/work/test/tidb2dw/pkg/bigquerysql/connector.go:112\ngithub.com/pingcap-inc/tidb2dw/replicate.(*SnapshotReplicateSession).Run.func2.1\n\t/home/test/work/test/tidb2dw/replicate/snapshot.go:124\nruntime.goexit\n\t/usr/lib/go-1.21/src/runtime/asm_amd64.s:1650"] [path=test.locations.000000000.csv] [2024/01/15 15:45:45.544 +00:00] [ERROR] [snapshot.go:176] ["Failed to load snapshot"] [table=test.locations] [error="Failed to load snapshot data into data warehouse, error files: [test.locations.000000000.csv]"] [errorVerbose="Failed to load snapshot data into data warehouse, error files: [test.locations.000000000.csv]\ngithub.com/pingcap-inc/tidb2dw/replicate.(*SnapshotReplicateSession).Run\n\t/home/test/work/test/tidb2dw/replicate/snapshot.go:145\ngithub.com/pingcap-inc/tidb2dw/replicate.StartReplicateSnapshot\n\t/home/test/work/test/tidb2dw/replicate/snapshot.go:175\ngithub.com/pingcap-inc/tidb2dw/cmd.Replicate.func1\n\t/home/test/work/test/tidb2dw/cmd/core.go:256\nruntime.goexit\n\t/usr/lib/go-1.21/src/runtime/asm_amd64.s:1650"]

eugen-korentsov commented 9 months ago

some of errors from bigquery logs: jobStatus: { additionalErrors: [ 0: { code: 3 message: "Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 71; errors: 100. Please look into the errors[] collection for more details. File: gs://test-tidb2dw/loc2/snapshot/test.locations.000000000.csv" } 1: { code: 3 message: "Error while reading data, error message: CSV processing encountered too many errors, giving up. Rows: 71; errors: 100; max bad: 0; error percent: 0" } 2: { code: 3 message: "Error while reading data, error message: Too many values in line. Found 169 column(s) when expecting 34.; line_number: 1 byte_offset_to_start_of_line: 0 File: gs://test-tidb2dw/loc2/snapshot/test.locations.000000000.csv" } 3: { code: 3 message: "Error while reading data, error message: Too many values in line. Found 152 column(s) when expecting 34.; line_number: 2 byte_offset_to_start_of_line: 1707 File: gs://test-tidb2dw/loc2/snapshot/test.locations.000000000.csv" } 4: { code: 3 message: "Error while reading data, error message: Too many values in line. Found 72 column(s) when expecting 34.; line_number: 3 byte_offset_to_start_of_line: 3228 File: gs://test-tidb2dw/loc2/snapshot/test.locations.000000000.csv" } 5: { code: 3 message: "Error while reading data, error message: Too many values in line. Found 45 column(s) when expecting 34.; line_number: 4 byte_offset_to_start_of_line: 3931 File: gs://test-tidb2dw/loc2/snapshot/test.locations.000000000.csv" } 6: { code: 3 message: "Error while reading data, error message: Too many values in line. Found 45 column(s) when expecting 34.; line_number: 5 byte_offset_to_start_of_line: 4325 File: gs://test-tidb2dw/loc2/snapshot/test.locations.000000000.csv" } 7: { code: 3 message: "Error while reading data, error message: Too many values in line. Found 36 column(s) when expecting 34.; line_number: 6 byte_offset_to_start_of_line: 4738 File: gs://test-tidb2dw/loc2/snapshot/test.locations.000000000.csv" } 8: { code: 3 message: "Error while reading data, error message: Too many values in line. Found 36 column(s) when expecting 34.; line_number: 29 byte_offset_to_start_of_line: 10888 File: gs://test-tidb2dw/loc2/snapshot/test.locations.000000000.csv" }

eugen-korentsov commented 9 months ago

@Lloyd-Pottiger please help, seems it is delimiter issue.

Lloyd-Pottiger commented 9 months ago

@wd0517 Could you please offer some help?

wd0517 commented 9 months ago

I've built latest version from main branch, it has more detailed logs: .... oauth2: cannot fetch token: 400 Bad Request\nResponse: {\"error\":\"invalid_scope\",\"error_description\":\"Invalid OAuth scope or ID token audience provided.\"} ....

https://github.com/pingcap-inc/tidb2dw/pull/85 will fix this issue.

wd0517 commented 9 months ago

some of errors from bigquery logs: jobStatus: { additionalErrors: [ 0: { code: 3 message: "Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 71; errors: 100. Please look into the errors[] collection for more details. File: gs://test-tidb2dw/loc2/snapshot/test.locations.000000000.csv" } 1: { code: 3 message: "Error while reading data, error message: CSV processing encountered too many errors, giving up. Rows: 71; errors: 100; max bad: 0; error percent: 0" } 2: { code: 3 message: "Error while reading data, error message: Too many val

Could you please share your table schema?

eugen-korentsov commented 9 months ago

@wd0517 { "DESCRIBE test": [ { "Field" : "id", "Type" : "int(11) unsigned", "Null" : "NO", "Key" : "PRI", "Default" : null, "Extra" : "auto_increment" }, { "Field" : "parent_id", "Type" : "int(11) unsigned", "Null" : "NO", "Key" : "MUL", "Default" : "0", "Extra" : "" }, { "Field" : "country_code", "Type" : "varchar(2)", "Null" : "YES", "Key" : "MUL", "Default" : null, "Extra" : "" }, { "Field" : "osm_id", "Type" : "bigint(20) unsigned", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "osm_type", "Type" : "varchar(8)", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "type", "Type" : "varchar(16)", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "level", "Type" : "tinyint(3) unsigned", "Null" : "NO", "Key" : "", "Default" : "0", "Extra" : "" }, { "Field" : "administrative_level", "Type" : "varchar(32)", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "matching_level", "Type" : "tinyint(3) unsigned", "Null" : "NO", "Key" : "", "Default" : "3", "Extra" : "" }, { "Field" : "official", "Type" : "tinyint(3) unsigned", "Null" : "NO", "Key" : "", "Default" : "1", "Extra" : "" }, { "Field" : "name_en", "Type" : "varchar(100)", "Null" : "YES", "Key" : "MUL", "Default" : null, "Extra" : "" }, { "Field" : "name_es", "Type" : "varchar(100)", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "name_de", "Type" : "varchar(100)", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "name_pt", "Type" : "varchar(100)", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "url_en", "Type" : "varchar(100)", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "url_es", "Type" : "varchar(100)", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "url_de", "Type" : "varchar(100)", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "url_pt", "Type" : "varchar(100)", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "secondary_ids", "Type" : "varchar(1024)", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "wp_page_id", "Type" : "int(11)", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "queue_id", "Type" : "int(10) unsigned", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "total_chunks", "Type" : "int(11)", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "processed_chunks", "Type" : "int(11)", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "progress", "Type" : "tinyint(4)", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "estate_update_queue_id", "Type" : "int(10) unsigned", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "estate_total_chunks", "Type" : "int(10) unsigned", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "estate_processed_chunks", "Type" : "int(10) unsigned", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "in_analytics", "Type" : "tinyint(3) unsigned", "Null" : "NO", "Key" : "", "Default" : "0", "Extra" : "" }, { "Field" : "latitude", "Type" : "float", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "longitude", "Type" : "float", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "geo_json", "Type" : "longtext", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" }, { "Field" : "available_for_registration", "Type" : "tinyint(1)", "Null" : "NO", "Key" : "", "Default" : "0", "Extra" : "" }, { "Field" : "label", "Type" : "varchar(255)", "Null" : "YES", "Key" : "", "Default" : "base", "Extra" : "" }, { "Field" : "zipcode", "Type" : "varchar(50)", "Null" : "YES", "Key" : "", "Default" : null, "Extra" : "" } ]}

wd0517 commented 9 months ago

I am unable to reproduce the error with the provided schema. Can you please assist by providing a minimal step?

eugen-korentsov commented 9 months ago

I've rebuild all from scratch, using latest main branch. Now I also can't reproduce error above. But get new one:

[2024/01/16 10:35:29.850 +00:00] [DEBUG] [status.go:32] ["stopping log progress"]
[2024/01/16 10:35:29.850 +00:00] [DEBUG] [writer_util.go:464] ["opened file"] [path=gcs://test-tidb2dw/loc-dev/snapshot/metadata]
[2024/01/16 10:35:29.895 +00:00] [INFO] [dump.go:145] ["Successfully dumped table from TiDB"] [status="{\"CompletedTables\":1,\"FinishedBytes\":60319192,\"FinishedRows\":264058,\"EstimateTotalRows\":264987,\"TotalTables\":1,\"CurrentSpeedBPS\":39109198.2480315,\"Progress\":\"100.00 %\"}"]
[2024/01/16 10:35:29.895 +00:00] [INFO] [snapshot.go:60] ["Creating replicate session"] [table=test.locations] [storage=/loc-dev/snapshot]
[2024/01/16 10:35:29.923 +00:00] [INFO] [config.go:56] ["TiDB connection established"]
[2024/01/16 10:35:30.824 +00:00] [INFO] [snapshot.go:91] ["Successfully copy table schema"] [table=test.locations]
[2024/01/16 10:35:30.856 +00:00] [INFO] [snapshot.go:123] ["Loading snapshot data into data warehouse"] [table=test.locations] [path=test.locations.0000000070000.csv]
[2024/01/16 10:35:30.858 +00:00] [INFO] [snapshot.go:123] ["Loading snapshot data into data warehouse"] [table=test.locations] [path=test.locations.0000000030000.csv]
[2024/01/16 10:35:30.859 +00:00] [INFO] [snapshot.go:123] ["Loading snapshot data into data warehouse"] [table=test.locations] [path=test.locations.0000000010000.csv]
[2024/01/16 10:35:30.860 +00:00] [INFO] [snapshot.go:123] ["Loading snapshot data into data warehouse"] [table=test.locations] [path=test.locations.0000000060000.csv]
[2024/01/16 10:35:30.860 +00:00] [INFO] [snapshot.go:123] ["Loading snapshot data into data warehouse"] [table=test.locations] [path=test.locations.0000000040000.csv]
[2024/01/16 10:35:30.861 +00:00] [INFO] [snapshot.go:123] ["Loading snapshot data into data warehouse"] [table=test.locations] [path=test.locations.0000000050000.csv]
[2024/01/16 10:35:30.860 +00:00] [INFO] [snapshot.go:123] ["Loading snapshot data into data warehouse"] [table=test.locations] [path=test.locations.0000000020000.csv]
[2024/01/16 10:35:33.019 +00:00] [INFO] [snapshot.go:128] ["Successfully load snapshot data into data warehouse"] [table=test.locations] [path=test.locations.0000000060000.csv]
[2024/01/16 10:35:33.136 +00:00] [ERROR] [snapshot.go:125] ["Failed to load snapshot data into data warehouse"] [table=test.locations] [error="Bigquery load snapshot job completed with error: {Location: \"\"; Message: \"Already Exists: Table test-cloud:tidb_prod.locations\"; Reason: \"duplicate\"}"] [errorVerbose="Bigquery load snapshot job completed with error: {Location: \"\"; Message: \"Already Exists: Table test-cloud:tidb_prod.locations\"; Reason: \"duplicate\"}\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.loadGCSFileToBigQuery\n\t/home/eugene/work/test/tidb2dw/pkg/bigquerysql/operation.go:51\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.(*BigQueryConnector).LoadSnapshot\n\t/home/eugene/work/test/tidb2dw/pkg/bigquerysql/connector.go:112\ngithub.com/pingcap-inc/tidb2dw/replicate.(*SnapshotReplicateSession).Run.func2.1\n\t/home/eugene/work/test/tidb2dw/replicate/snapshot.go:124\nruntime.goexit\n\t/usr/lib/go-1.21/src/runtime/asm_amd64.s:1650"] [path=test.locations.0000000040000.csv]
[2024/01/16 10:35:33.286 +00:00] [ERROR] [snapshot.go:125] ["Failed to load snapshot data into data warehouse"] [table=test.locations] [error="Bigquery load snapshot job completed with error: {Location: \"\"; Message: \"Already Exists: Table test-cloud:tidb_prod.locations\"; Reason: \"duplicate\"}"] [errorVerbose="Bigquery load snapshot job completed with error: {Location: \"\"; Message: \"Already Exists: Table test-cloud:tidb_prod.locations\"; Reason: \"duplicate\"}\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.loadGCSFileToBigQuery\n\t/home/eugene/work/test/tidb2dw/pkg/bigquerysql/operation.go:51\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.(*BigQueryConnector).LoadSnapshot\n\t/home/eugene/work/test/tidb2dw/pkg/bigquerysql/connector.go:112\ngithub.com/pingcap-inc/tidb2dw/replicate.(*SnapshotReplicateSession).Run.func2.1\n\t/home/eugene/work/test/tidb2dw/replicate/snapshot.go:124\nruntime.goexit\n\t/usr/lib/go-1.21/src/runtime/asm_amd64.s:1650"] [path=test.locations.0000000030000.csv]
[2024/01/16 10:35:33.331 +00:00] [ERROR] [snapshot.go:125] ["Failed to load snapshot data into data warehouse"] [table=test.locations] [error="Bigquery load snapshot job completed with error: {Location: \"\"; Message: \"Already Exists: Table test-cloud:tidb_prod.locations\"; Reason: \"duplicate\"}"] [errorVerbose="Bigquery load snapshot job completed with error: {Location: \"\"; Message: \"Already Exists: Table test-cloud:tidb_prod.locations\"; Reason: \"duplicate\"}\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.loadGCSFileToBigQuery\n\t/home/eugene/work/test/tidb2dw/pkg/bigquerysql/operation.go:51\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.(*BigQueryConnector).LoadSnapshot\n\t/home/eugene/work/test/tidb2dw/pkg/bigquerysql/connector.go:112\ngithub.com/pingcap-inc/tidb2dw/replicate.(*SnapshotReplicateSession).Run.func2.1\n\t/home/eugene/work/test/tidb2dw/replicate/snapshot.go:124\nruntime.goexit\n\t/usr/lib/go-1.21/src/runtime/asm_amd64.s:1650"] [path=test.locations.0000000050000.csv]
[2024/01/16 10:35:33.756 +00:00] [ERROR] [snapshot.go:125] ["Failed to load snapshot data into data warehouse"] [table=test.locations] [error="Bigquery load snapshot job completed with error: {Location: \"\"; Message: \"Already Exists: Table test-cloud:tidb_prod.locations\"; Reason: \"duplicate\"}"] [errorVerbose="Bigquery load snapshot job completed with error: {Location: \"\"; Message: \"Already Exists: Table test-cloud:tidb_prod.locations\"; Reason: \"duplicate\"}\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.loadGCSFileToBigQuery\n\t/home/eugene/work/test/tidb2dw/pkg/bigquerysql/operation.go:51\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.(*BigQueryConnector).LoadSnapshot\n\t/home/eugene/work/test/tidb2dw/pkg/bigquerysql/connector.go:112\ngithub.com/pingcap-inc/tidb2dw/replicate.(*SnapshotReplicateSession).Run.func2.1\n\t/home/eugene/work/test/tidb2dw/replicate/snapshot.go:124\nruntime.goexit\n\t/usr/lib/go-1.21/src/runtime/asm_amd64.s:1650"] [path=test.locations.0000000020000.csv]
[2024/01/16 10:35:33.940 +00:00] [ERROR] [snapshot.go:125] ["Failed to load snapshot data into data warehouse"] [table=test.locations] [error="Bigquery load snapshot job completed with error: {Location: \"\"; Message: \"Already Exists: Table test-cloud:tidb_prod.locations\"; Reason: \"duplicate\"}"] [errorVerbose="Bigquery load snapshot job completed with error: {Location: \"\"; Message: \"Already Exists: Table test-cloud:tidb_prod.locations\"; Reason: \"duplicate\"}\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.loadGCSFileToBigQuery\n\t/home/eugene/work/test/tidb2dw/pkg/bigquerysql/operation.go:51\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.(*BigQueryConnector).LoadSnapshot\n\t/home/eugene/work/test/tidb2dw/pkg/bigquerysql/connector.go:112\ngithub.com/pingcap-inc/tidb2dw/replicate.(*SnapshotReplicateSession).Run.func2.1\n\t/home/eugene/work/test/tidb2dw/replicate/snapshot.go:124\nruntime.goexit\n\t/usr/lib/go-1.21/src/runtime/asm_amd64.s:1650"] [path=test.locations.0000000070000.csv]
[2024/01/16 10:35:35.637 +00:00] [ERROR] [snapshot.go:125] ["Failed to load snapshot data into data warehouse"] [table=test.locations] [error="Bigquery load snapshot job completed with error: {Location: \"\"; Message: \"Already Exists: Table test-cloud:tidb_prod.locations\"; Reason: \"duplicate\"}"] [errorVerbose="Bigquery load snapshot job completed with error: {Location: \"\"; Message: \"Already Exists: Table test-cloud:tidb_prod.locations\"; Reason: \"duplicate\"}\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.loadGCSFileToBigQuery\n\t/home/eugene/work/test/tidb2dw/pkg/bigquerysql/operation.go:51\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.(*BigQueryConnector).LoadSnapshot\n\t/home/eugene/work/test/tidb2dw/pkg/bigquerysql/connector.go:112\ngithub.com/pingcap-inc/tidb2dw/replicate.(*SnapshotReplicateSession).Run.func2.1\n\t/home/eugene/work/test/tidb2dw/replicate/snapshot.go:124\nruntime.goexit\n\t/usr/lib/go-1.21/src/runtime/asm_amd64.s:1650"] [path=test.locations.0000000010000.csv]
[2024/01/16 10:35:35.637 +00:00] [ERROR] [snapshot.go:176] ["Failed to load snapshot"] [table=test.locations] [error="Failed to load snapshot data into data warehouse, error files: [test.locations.0000000040000.csv test.locations.0000000030000.csv test.locations.0000000050000.csv test.locations.0000000020000.csv test.locations.0000000070000.csv test.locations.0000000010000.csv]"] [errorVerbose="Failed to load snapshot data into data warehouse, error files: [test.locations.0000000040000.csv test.locations.0000000030000.csv test.locations.0000000050000.csv test.locations.0000000020000.csv test.locations.0000000070000.csv test.locations.0000000010000.csv]\ngithub.com/pingcap-inc/tidb2dw/replicate.(*SnapshotReplicateSession).Run\n\t/home/eugene/work/test/tidb2dw/replicate/snapshot.go:145\ngithub.com/pingcap-inc/tidb2dw/replicate.StartReplicateSnapshot\n\t/home/eugene/work/test/tidb2dw/replicate/snapshot.go:175\ngithub.com/pingcap-inc/tidb2dw/cmd.Replicate.func1\n\t/home/eugene/work/test/tidb2dw/cmd/core.go:256\nruntime.goexit\n\t/usr/lib/go-1.21/src/runtime/asm_amd64.s:1650"]
eugen-korentsov commented 9 months ago

@wd0517 I've tested it with another table, works fine. What I noticed is that in the locations table, in column secondary_ids which is varchar, there are values as arrays like [9,10,58,263,2504], same with zipcode column, but it is with double quotes like: ["07518", "07349", "07142", "07007", "07198", "07212"] Is this the issue ?

eugen-korentsov commented 9 months ago

So did some other tests, if I dump this table and restore the dump to table with same schema but different name, then tidb2dw works fine with this new table. What I noticed that Auto increment is not increased in original table, but increased in new table (see screenshots) This tidb was updated from 6.5.0 to 7.5.0. locations random_table

wd0517 commented 9 months ago

I can reproduce Already Exists: Table ****:test_tidb2dw.test\"; Reason: \"duplicate\ error when I manually create a row with duplicate primary key in dumpling file, it's strange.

It would be helpful if you could check for any rows with duplicate primary keys in dumpling files.

So did some other tests, if I dump this table and restore the dump to table with same schema but different name, then tidb2dw works fine with this new table.

How did you dump and restore the table? And please provide the results using the SHOW CREATE TABLE <table_name> statement for both the original and new tables.

wd0517 commented 9 months ago

Remember to clean the GCS storage file or use a new storage path after a failed attempt. I noticed that your GCS file format is test.locations.0000000040000.csv, which is usually due to a dirty storage path, this may be the reason why you got duplicate error.

eugen-korentsov commented 9 months ago

How did you dump and restore the table

mysqldump database table > table.sql mysql database < table.sql

Show create table locations: locations "CREATE TABLE locations ( id int(11) unsigned NOT NULL AUTO_INCREMENT, parent_id int(11) unsigned NOT NULL DEFAULT '0', country_code varchar(2) DEFAULT NULL, osm_id bigint(20) unsigned DEFAULT NULL, osm_type varchar(8) DEFAULT NULL, type varchar(16) DEFAULT NULL, level tinyint(3) unsigned NOT NULL DEFAULT '0', administrative_level varchar(32) DEFAULT NULL, matching_level tinyint(3) unsigned NOT NULL DEFAULT '3', official tinyint(3) unsigned NOT NULL DEFAULT '1', name_en varchar(100) DEFAULT NULL, name_es varchar(100) DEFAULT NULL, name_de varchar(100) DEFAULT NULL, name_pt varchar(100) DEFAULT NULL, url_en varchar(100) DEFAULT NULL, url_es varchar(100) DEFAULT NULL, url_de varchar(100) DEFAULT NULL, url_pt varchar(100) DEFAULT NULL, secondary_ids varchar(1024) DEFAULT NULL, wp_page_id int(11) DEFAULT NULL, queue_id int(10) unsigned DEFAULT NULL, total_chunks int(11) DEFAULT NULL, processed_chunks int(11) DEFAULT NULL, progress tinyint(4) DEFAULT NULL, estate_update_queue_id int(10) unsigned DEFAULT NULL, estate_total_chunks int(10) unsigned DEFAULT NULL, estate_processed_chunks int(10) unsigned DEFAULT NULL, in_analytics tinyint(3) unsigned NOT NULL DEFAULT '0', latitude float DEFAULT NULL, longitude float DEFAULT NULL, geo_json longtext DEFAULT NULL, available_for_registration tinyint(1) NOT NULL DEFAULT '0', label varchar(255) DEFAULT 'base', zipcode varchar(50) DEFAULT NULL, PRIMARY KEY (id) /T![clustered_index] CLUSTERED /, KEY name (name_en), KEY parent_id (parent_id), KEY country_code (country_code) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=4957988"

show create table random_table: random_table "CREATE TABLE random_table ( id int(11) unsigned NOT NULL AUTO_INCREMENT, parent_id int(11) unsigned NOT NULL DEFAULT '0', country_code varchar(2) DEFAULT NULL, osm_id bigint(20) unsigned DEFAULT NULL, osm_type varchar(8) DEFAULT NULL, type varchar(16) DEFAULT NULL, level tinyint(3) unsigned NOT NULL DEFAULT '0', administrative_level varchar(32) DEFAULT NULL, matching_level tinyint(3) unsigned NOT NULL DEFAULT '3', official tinyint(3) unsigned NOT NULL DEFAULT '1', name_en varchar(100) DEFAULT NULL, name_es varchar(100) DEFAULT NULL, name_de varchar(100) DEFAULT NULL, name_pt varchar(100) DEFAULT NULL, url_en varchar(100) DEFAULT NULL, url_es varchar(100) DEFAULT NULL, url_de varchar(100) DEFAULT NULL, url_pt varchar(100) DEFAULT NULL, secondary_ids varchar(1024) DEFAULT NULL, wp_page_id int(11) DEFAULT NULL, queue_id int(10) unsigned DEFAULT NULL, total_chunks int(11) DEFAULT NULL, processed_chunks int(11) DEFAULT NULL, progress tinyint(4) DEFAULT NULL, estate_update_queue_id int(10) unsigned DEFAULT NULL, estate_total_chunks int(10) unsigned DEFAULT NULL, estate_processed_chunks int(10) unsigned DEFAULT NULL, in_analytics tinyint(3) unsigned NOT NULL DEFAULT '0', latitude float DEFAULT NULL, longitude float DEFAULT NULL, geo_json longtext DEFAULT NULL, available_for_registration tinyint(1) NOT NULL DEFAULT '0', label varchar(255) DEFAULT 'base', zipcode varchar(50) DEFAULT NULL, PRIMARY KEY (id) /T![clustered_index] CLUSTERED /, KEY name (name_en), KEY parent_id (parent_id), KEY country_code (country_code) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=4987988"

Remember to clean the GCS storage file or use a new storage path

Yes, I always use new path.

It would be helpful if you could check for any rows with duplicate primary keys in dumpling files.

I will check

eugen-korentsov commented 9 months ago

Nope, no duplicate primary keys, how I checked: cat database.locations.00000000*.csv | cut -d ',' -f 1 | uniq -d No result.

I've tested this way by manually adding duplicate keys, to be sure - works fine.

eugen-korentsov commented 9 months ago

About file naming, it is because during dump it is splitting it by 8 chunks, here is full log from start of dump and till bigquerry error:

[2024/01/16 17:22:06.773 +00:00] [INFO] [core.go:188] ["Start Export"] [stage=init] [mode=full]
[2024/01/16 17:22:06.787 +00:00] [INFO] [config.go:56] ["TiDB connection established"]
[2024/01/16 17:22:06.789 +00:00] [INFO] [tso.go:21] ["Successfully get current tso"] [tso=0]
[2024/01/16 17:22:07.033 +00:00] [INFO] [connector.go:92] ["create changefeed success"] [changefeed-id=b9862ff8-c6dd-4c01-961d-87f7c885985a] [replica-config="{\"bdr_mode\":false,\"case_sensitive\":false,\"changefeed_error_stuck_duration\":1800000000000,\"check_gc_safe_point\":true,\"consistent\":{\"encoding_worker_num\":16,\"flush_interval\":2000,\"flush_worker_num\":8,\"level\":\"none\",\"max_log_size\":64,\"meta_flush_interval\":200,\"use_file_backend\":false},\"enable_sync_point\":false,\"filter\":{\"rules\":[\"database.locations\"]},\"force_replicate\":false,\"ignore_ineligible_table\":false,\"integrity\":{\"corruption_handle_level\":\"warn\",\"integrity_check_level\":\"none\"},\"memory_quota\":1073741824,\"mounter\":{\"worker_num\":16},\"scheduler\":{\"enable_table_across_nodes\":false,\"region_threshold\":100000,\"write_key_threshold\":0},\"sink\":{\"advance_timeout\":150,\"cloud_storage_config\":{\"file_size\":67108864,\"flush_interval\":\"1m0s\",\"output_column_id\":true},\"csv\":{\"binary_encoding_method\":\"hex\",\"delimiter\":\",\",\"include_commit_ts\":true,\"null\":\"\\\\N\",\"quote\":\"\\\"\"},\"date_separator\":\"day\",\"delete_only_output_handle_key_columns\":false,\"enable_kafka_sink_v2\":false,\"enable_partition_separator\":true,\"encoder_concurrency\":32,\"file_index_width\":20,\"only_output_updated_columns\":false,\"protocol\":\"csv\",\"terminator\":\"\\r\\n\"},\"sql_mode\":\"ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION\",\"sync_point_interval\":600000000000,\"sync_point_retention\":86400000000000}"]
[2024/01/16 17:22:07.179 +00:00] [INFO] [config.go:56] ["TiDB connection established"]
[2024/01/16 17:22:07.198 +00:00] [INFO] [version.go:434] ["detect server version"] [type=TiDB] [version=7.5.0]
[2024/01/16 17:22:07.200 +00:00] [INFO] [dump.go:1446] ["meet some problem while fetching pd addrs. This won't affect dump process"] [error="sql: SELECT * FROM information_schema.cluster_info where type = 'pd';: Error 1227 (42000): Access denied; you need (at least one of) the PROCESS privilege(s) for this operation"]
[2024/01/16 17:22:07.201 +00:00] [WARN] [dump.go:1506] ["If the amount of data to dump is large, criteria: (data more than 60GB or dumped time more than 10 minutes)\nyou'd better adjust the tikv_gc_life_time to avoid export failure due to TiDB GC during the dump process.\nBefore dumping: run sql `update mysql.tidb set VARIABLE_VALUE = '720h' where VARIABLE_NAME = 'tikv_gc_life_time';` in tidb.\nAfter dumping: run sql `update mysql.tidb set VARIABLE_VALUE = '10m' where VARIABLE_NAME = 'tikv_gc_life_time';` in tidb.\n"]
[2024/01/16 17:22:07.238 +00:00] [INFO] [dump.go:79] ["Using snapshot"] [snapshot=447067121822466056]
[2024/01/16 17:22:07.238 +00:00] [INFO] [dump.go:151] ["begin to run Dump"] [conf="{\"s3\":{\"endpoint\":\"\",\"region\":\"\",\"storage-class\":\"\",\"sse\":\"\",\"sse-kms-key-id\":\"\",\"acl\":\"\",\"access-key\":\"\",\"secret-access-key\":\"\",\"session-token\":\"\",\"provider\":\"\",\"force-path-style\":false,\"use-accelerate-endpoint\":false,\"role-arn\":\"\",\"external-id\":\"\",\"object-lock-enabled\":false},\"gcs\":{\"endpoint\":\"\",\"storage-class\":\"\",\"predefined-acl\":\"\",\"credentials-file\":\"\"},\"azblob\":{\"endpoint\":\"\",\"account-name\":\"\",\"account-key\":\"\",\"access-tier\":\"\",\"sas-token\":\"\",\"encryption-scope\":\"\",\"encryption-key\":\"\"},\"SpecifiedTables\":true,\"AllowCleartextPasswords\":false,\"SortByPk\":true,\"NoViews\":true,\"NoSequences\":true,\"NoHeader\":true,\"NoSchemas\":false,\"NoData\":false,\"CompleteInsert\":false,\"TransactionalConsistency\":true,\"EscapeBackslash\":false,\"DumpEmptyDatabase\":true,\"PosAfterConnect\":false,\"CompressType\":0,\"Host\":\"dev-tidb.tidb.svc.cluster.local\",\"Port\":4000,\"Threads\":8,\"User\":\"tidb2dw\",\"Security\":{\"CAPath\":\"\",\"CertPath\":\"\",\"KeyPath\":\"\"},\"LogLevel\":\"\",\"LogFile\":\"\",\"LogFormat\":\"\",\"OutputDirPath\":\"gs://test-tidb2dw/loc34-dev/snapshot?credentials-file=%2Fticdc%2Fcred.json\",\"StatusAddr\":\":8281\",\"Snapshot\":\"447067121822466056\",\"Consistency\":\"snapshot\",\"CsvNullValue\":\"\\\\N\",\"SQL\":\"\",\"CsvSeparator\":\",\",\"CsvDelimiter\":\"\\\"\",\"CsvLineTerminator\":\"\\r\\n\",\"Databases\":null,\"Where\":\"\",\"FileType\":\"csv\",\"ServerInfo\":{\"ServerType\":3,\"ServerVersion\":\"7.5.0\",\"HasTiKV\":true},\"Rows\":1,\"ReadTimeout\":0,\"TiDBMemQuotaQuery\":0,\"FileSize\":5368709120,\"StatementSize\":1000000,\"SessionParams\":{\"tidb_snapshot\":\"447067121822466056\"},\"Tables\":{\"database\":[{\"Name\":\"locations\",\"AvgRowLength\":0,\"Type\":0}]},\"CollationCompatible\":\"loose\",\"CsvOutputDialect\":3,\"IOTotalBytes\":null,\"Net\":\"\"}"]
[2024/01/16 17:22:07.261 +00:00] [DEBUG] [dump.go:1600] ["no need to build region info because database is not TiDB 3.x"]
[2024/01/16 17:22:07.453 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="meta of dababase 'database'"]
[2024/01/16 17:22:07.453 +00:00] [DEBUG] [dump.go:441] ["start dumping table..."] [database=database] [table=locations]
[2024/01/16 17:22:07.453 +00:00] [DEBUG] [writer_util.go:464] ["opened file"] [path=gcs://test-tidb2dw/loc34-dev/snapshot/database-schema-create.sql]
[2024/01/16 17:22:07.454 +00:00] [DEBUG] [writer_util.go:126] ["start dumping meta data"] [target=database]
[2024/01/16 17:22:07.454 +00:00] [DEBUG] [writer_util.go:139] ["finish dumping meta data"] [target=database]
[2024/01/16 17:22:07.478 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="meta of table 'database'.'locations'"]
[2024/01/16 17:22:07.478 +00:00] [DEBUG] [writer_util.go:464] ["opened file"] [path=gcs://test-tidb2dw/loc34-dev/snapshot/database.locations-schema.sql]
[2024/01/16 17:22:07.478 +00:00] [DEBUG] [writer_util.go:126] ["start dumping meta data"] [target=database]
[2024/01/16 17:22:07.479 +00:00] [DEBUG] [writer_util.go:139] ["finish dumping meta data"] [target=database]
[2024/01/16 17:22:07.481 +00:00] [DEBUG] [dump.go:878] ["dumping TiDB tables with TABLESAMPLE"] [database=database] [table=locations]
[2024/01/16 17:22:07.491 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="data of table 'database'.'locations'(0/8)"]
[2024/01/16 17:22:07.492 +00:00] [DEBUG] [writer.go:200] ["trying to dump table chunk"] [retryTime=1] [db=database] [table=locations] [chunkIndex=1] []
[2024/01/16 17:22:07.492 +00:00] [DEBUG] [ir_impl.go:205] ["try to start tableData"] [query="SELECT * FROM `database`.`locations` WHERE `id`>=1 and `id`<188759  ORDER BY `id`"]
[2024/01/16 17:22:07.492 +00:00] [DEBUG] [writer.go:200] ["trying to dump table chunk"] [retryTime=1] [db=database] [table=locations] [chunkIndex=0] []
[2024/01/16 17:22:07.492 +00:00] [DEBUG] [ir_impl.go:205] ["try to start tableData"] [query="SELECT * FROM `database`.`locations` WHERE `id`<1  ORDER BY `id`"]
[2024/01/16 17:22:07.492 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="data of table 'database'.'locations'(1/8)"]
[2024/01/16 17:22:07.492 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="data of table 'database'.'locations'(2/8)"]
[2024/01/16 17:22:07.492 +00:00] [DEBUG] [writer.go:200] ["trying to dump table chunk"] [retryTime=1] [db=database] [table=locations] [chunkIndex=3] []
[2024/01/16 17:22:07.492 +00:00] [DEBUG] [ir_impl.go:205] ["try to start tableData"] [query="SELECT * FROM `database`.`locations` WHERE `id`>=222755 and `id`<262645  ORDER BY `id`"]
[2024/01/16 17:22:07.492 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="data of table 'database'.'locations'(3/8)"]
[2024/01/16 17:22:07.493 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="data of table 'database'.'locations'(4/8)"]
[2024/01/16 17:22:07.493 +00:00] [DEBUG] [writer.go:200] ["trying to dump table chunk"] [retryTime=1] [db=database] [table=locations] [chunkIndex=5] []
[2024/01/16 17:22:07.493 +00:00] [DEBUG] [ir_impl.go:205] ["try to start tableData"] [query="SELECT * FROM `database`.`locations` WHERE `id`>=307804 and `id`<479689  ORDER BY `id`"]
[2024/01/16 17:22:07.493 +00:00] [DEBUG] [writer.go:200] ["trying to dump table chunk"] [retryTime=1] [db=database] [table=locations] [chunkIndex=2] []
[2024/01/16 17:22:07.493 +00:00] [DEBUG] [ir_impl.go:205] ["try to start tableData"] [query="SELECT * FROM `database`.`locations` WHERE `id`>=188759 and `id`<222755  ORDER BY `id`"]
[2024/01/16 17:22:07.493 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="data of table 'database'.'locations'(5/8)"]
[2024/01/16 17:22:07.493 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="data of table 'database'.'locations'(6/8)"]
[2024/01/16 17:22:07.493 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="data of table 'database'.'locations'(7/8)"]
[2024/01/16 17:22:07.493 +00:00] [DEBUG] [writer.go:200] ["trying to dump table chunk"] [retryTime=1] [db=database] [table=locations] [chunkIndex=4] []
[2024/01/16 17:22:07.493 +00:00] [DEBUG] [ir_impl.go:205] ["try to start tableData"] [query="SELECT * FROM `database`.`locations` WHERE `id`>=262645 and `id`<307804  ORDER BY `id`"]
[2024/01/16 17:22:07.499 +00:00] [INFO] [writer.go:272] ["no data written in table chunk"] [database=database] [table=locations] [chunkIdx=0]
[2024/01/16 17:22:07.499 +00:00] [DEBUG] [dump.go:369] ["finish dumping table data task"] [database=database] [table=locations] [chunkIdx=0]
[2024/01/16 17:22:07.499 +00:00] [DEBUG] [writer.go:200] ["trying to dump table chunk"] [retryTime=1] [db=database] [table=locations] [chunkIndex=6] []
[2024/01/16 17:22:07.499 +00:00] [DEBUG] [ir_impl.go:205] ["try to start tableData"] [query="SELECT * FROM `database`.`locations` WHERE `id`>=479689 and `id`<504744  ORDER BY `id`"]
[2024/01/16 17:22:07.514 +00:00] [DEBUG] [writer.go:200] ["trying to dump table chunk"] [retryTime=1] [db=database] [table=locations] [chunkIndex=7] []
[2024/01/16 17:22:07.514 +00:00] [DEBUG] [ir_impl.go:205] ["try to start tableData"] [query="SELECT * FROM `database`.`locations` WHERE `id`>=504744  ORDER BY `id`"]
[2024/01/16 17:22:07.624 +00:00] [DEBUG] [writer_util.go:498] ["opened file"] [path=gcs://test-tidb2dw/loc34-dev/snapshot/database.locations.0000000040000.csv]
[2024/01/16 17:22:07.690 +00:00] [DEBUG] [writer_util.go:498] ["opened file"] [path=gcs://test-tidb2dw/loc34-dev/snapshot/database.locations.0000000010000.csv]
[2024/01/16 17:22:07.754 +00:00] [DEBUG] [writer_util.go:498] ["opened file"] [path=gcs://test-tidb2dw/loc34-dev/snapshot/database.locations.0000000050000.csv]
[2024/01/16 17:22:07.775 +00:00] [DEBUG] [writer_util.go:498] ["opened file"] [path=gcs://test-tidb2dw/loc34-dev/snapshot/database.locations.0000000020000.csv]
[2024/01/16 17:22:07.798 +00:00] [DEBUG] [writer_util.go:498] ["opened file"] [path=gcs://test-tidb2dw/loc34-dev/snapshot/database.locations.0000000060000.csv]
[2024/01/16 17:22:07.798 +00:00] [DEBUG] [writer_util.go:498] ["opened file"] [path=gcs://test-tidb2dw/loc34-dev/snapshot/database.locations.0000000030000.csv]
[2024/01/16 17:22:07.897 +00:00] [DEBUG] [writer_util.go:498] ["opened file"] [path=gcs://test-tidb2dw/loc34-dev/snapshot/database.locations.0000000070000.csv]
[2024/01/16 17:22:08.836 +00:00] [DEBUG] [writer_util.go:351] ["finish dumping table(chunk)"] [database=database] [table=locations] ["finished rows"=25055] ["finished size"=5806417]
[2024/01/16 17:22:08.836 +00:00] [DEBUG] [writer_util.go:508] ["tear down lazy file writer..."] [path=gcs://test-tidb2dw/loc34-dev/snapshot/database.locations.0000000060000.csv]
[2024/01/16 17:22:08.880 +00:00] [DEBUG] [writer_util.go:351] ["finish dumping table(chunk)"] [database=database] [table=locations] ["finished rows"=30085] ["finished size"=7191498]
[2024/01/16 17:22:08.880 +00:00] [DEBUG] [writer_util.go:508] ["tear down lazy file writer..."] [path=gcs://test-tidb2dw/loc34-dev/snapshot/database.locations.0000000050000.csv]
[2024/01/16 17:22:08.930 +00:00] [DEBUG] [writer_util.go:351] ["finish dumping table(chunk)"] [database=database] [table=locations] ["finished rows"=36784] ["finished size"=8078181]
[2024/01/16 17:22:08.930 +00:00] [DEBUG] [writer_util.go:508] ["tear down lazy file writer..."] [path=gcs://test-tidb2dw/loc34-dev/snapshot/database.locations.0000000040000.csv]
[2024/01/16 17:22:08.982 +00:00] [DEBUG] [writer_util.go:351] ["finish dumping table(chunk)"] [database=database] [table=locations] ["finished rows"=32993] ["finished size"=7412743]
[2024/01/16 17:22:08.982 +00:00] [DEBUG] [writer_util.go:508] ["tear down lazy file writer..."] [path=gcs://test-tidb2dw/loc34-dev/snapshot/database.locations.0000000020000.csv]
[2024/01/16 17:22:09.090 +00:00] [DEBUG] [writer.go:256] ["finish dumping table(chunk)"] [database=database] [table=locations] [chunkIdx=6] ["total rows"=25055]
[2024/01/16 17:22:09.092 +00:00] [DEBUG] [dump.go:369] ["finish dumping table data task"] [database=database] [table=locations] [chunkIdx=6]
[2024/01/16 17:22:09.098 +00:00] [DEBUG] [writer.go:256] ["finish dumping table(chunk)"] [database=database] [table=locations] [chunkIdx=5] ["total rows"=30085]
[2024/01/16 17:22:09.099 +00:00] [DEBUG] [dump.go:369] ["finish dumping table data task"] [database=database] [table=locations] [chunkIdx=5]
[2024/01/16 17:22:09.114 +00:00] [DEBUG] [writer_util.go:351] ["finish dumping table(chunk)"] [database=database] [table=locations] ["finished rows"=39889] ["finished size"=9201683]
[2024/01/16 17:22:09.114 +00:00] [DEBUG] [writer_util.go:508] ["tear down lazy file writer..."] [path=gcs://test-tidb2dw/loc34-dev/snapshot/database.locations.0000000030000.csv]
[2024/01/16 17:22:09.148 +00:00] [DEBUG] [writer.go:256] ["finish dumping table(chunk)"] [database=database] [table=locations] [chunkIdx=4] ["total rows"=36784]
[2024/01/16 17:22:09.148 +00:00] [DEBUG] [dump.go:369] ["finish dumping table data task"] [database=database] [table=locations] [chunkIdx=4]
[2024/01/16 17:22:09.174 +00:00] [DEBUG] [writer_util.go:351] ["finish dumping table(chunk)"] [database=database] [table=locations] ["finished rows"=49795] ["finished size"=11809904]
[2024/01/16 17:22:09.174 +00:00] [DEBUG] [writer_util.go:508] ["tear down lazy file writer..."] [path=gcs://test-tidb2dw/loc34-dev/snapshot/database.locations.0000000010000.csv]
[2024/01/16 17:22:09.196 +00:00] [DEBUG] [writer.go:256] ["finish dumping table(chunk)"] [database=database] [table=locations] [chunkIdx=2] ["total rows"=32993]
[2024/01/16 17:22:09.196 +00:00] [DEBUG] [dump.go:369] ["finish dumping table data task"] [database=database] [table=locations] [chunkIdx=2]
[2024/01/16 17:22:09.238 +00:00] [DEBUG] [writer.go:256] ["finish dumping table(chunk)"] [database=database] [table=locations] [chunkIdx=3] ["total rows"=39889]
[2024/01/16 17:22:09.238 +00:00] [DEBUG] [dump.go:369] ["finish dumping table data task"] [database=database] [table=locations] [chunkIdx=3]
[2024/01/16 17:22:09.309 +00:00] [DEBUG] [writer_util.go:351] ["finish dumping table(chunk)"] [database=database] [table=locations] ["finished rows"=49457] ["finished size"=10818766]
[2024/01/16 17:22:09.309 +00:00] [DEBUG] [writer_util.go:508] ["tear down lazy file writer..."] [path=gcs://test-tidb2dw/loc34-dev/snapshot/database.locations.0000000070000.csv]
[2024/01/16 17:22:09.313 +00:00] [DEBUG] [writer.go:256] ["finish dumping table(chunk)"] [database=database] [table=locations] [chunkIdx=1] ["total rows"=49795]
[2024/01/16 17:22:09.313 +00:00] [DEBUG] [dump.go:369] ["finish dumping table data task"] [database=database] [table=locations] [chunkIdx=1]
[2024/01/16 17:22:09.540 +00:00] [DEBUG] [writer.go:256] ["finish dumping table(chunk)"] [database=database] [table=locations] [chunkIdx=7] ["total rows"=49457]
[2024/01/16 17:22:09.540 +00:00] [DEBUG] [dump.go:369] ["finish dumping table data task"] [database=database] [table=locations] [chunkIdx=7]
[2024/01/16 17:22:09.540 +00:00] [INFO] [collector.go:264] ["backup success summary"] [total-ranges=10] [ranges-succeed=10] [ranges-failed=0] [total-take=2.090275459s] [total-kv-size=60.32MB] [average-speed=28.86MB/s] [total-rows=264058]
[2024/01/16 17:22:09.540 +00:00] [DEBUG] [status.go:32] ["stopping log progress"]
[2024/01/16 17:22:09.541 +00:00] [DEBUG] [writer_util.go:464] ["opened file"] [path=gcs://test-tidb2dw/loc34-dev/snapshot/metadata]
[2024/01/16 17:22:09.595 +00:00] [INFO] [dump.go:145] ["Successfully dumped table from TiDB"] [status="{\"CompletedTables\":1,\"FinishedBytes\":60319192,\"FinishedRows\":264058,\"EstimateTotalRows\":264987,\"TotalTables\":1,\"CurrentSpeedBPS\":24961694.190491002,\"Progress\":\"100.00 %\"}"]
[2024/01/16 17:22:09.596 +00:00] [INFO] [snapshot.go:60] ["Creating replicate session"] [table=database.locations] [storage=/loc34-dev/snapshot]
[2024/01/16 17:22:09.609 +00:00] [INFO] [config.go:56] ["TiDB connection established"]
[2024/01/16 17:22:10.419 +00:00] [INFO] [snapshot.go:91] ["Successfully copy table schema"] [table=database.locations]
[2024/01/16 17:22:10.459 +00:00] [INFO] [snapshot.go:123] ["Loading snapshot data into data warehouse"] [table=database.locations] [path=database.locations.0000000070000.csv]
[2024/01/16 17:22:10.460 +00:00] [INFO] [snapshot.go:123] ["Loading snapshot data into data warehouse"] [table=database.locations] [path=database.locations.0000000010000.csv]
[2024/01/16 17:22:10.461 +00:00] [INFO] [snapshot.go:123] ["Loading snapshot data into data warehouse"] [table=database.locations] [path=database.locations.0000000020000.csv]
[2024/01/16 17:22:10.460 +00:00] [INFO] [snapshot.go:123] ["Loading snapshot data into data warehouse"] [table=database.locations] [path=database.locations.0000000040000.csv]
[2024/01/16 17:22:10.460 +00:00] [INFO] [snapshot.go:123] ["Loading snapshot data into data warehouse"] [table=database.locations] [path=database.locations.0000000060000.csv]
[2024/01/16 17:22:10.462 +00:00] [INFO] [snapshot.go:123] ["Loading snapshot data into data warehouse"] [table=database.locations] [path=database.locations.0000000050000.csv]
[2024/01/16 17:22:10.460 +00:00] [INFO] [snapshot.go:123] ["Loading snapshot data into data warehouse"] [table=database.locations] [path=database.locations.0000000030000.csv]
[2024/01/16 17:22:12.797 +00:00] [ERROR] [snapshot.go:125] ["Failed to load snapshot data into data warehouse"] [table=database.locations] [error="Bigquery load snapshot job completed with error: {Location: \"\"; Message: \"Already Exists: Table database-cloud:tidb_prod.locations\"; Reason: \"duplicate\"}"] [errorVerbose="Bigquery load snapshot job completed with error: {Location: \"\"; Message: \"Already Exists: Table database-cloud:tidb_prod.locations\"; Reason: \"duplicate\"}\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.loadGCSFileToBigQuery\n\t/home/eugene/work/database/tidb2dw/pkg/bigquerysql/operation.go:51\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.(*BigQueryConnector).LoadSnapshot\n\t/home/eugene/work/database/tidb2dw/pkg/bigquerysql/connector.go:112\ngithub.com/pingcap-inc/tidb2dw/replicate.(*SnapshotReplicateSession).Run.func2.1\n\t/home/eugene/work/database/tidb2dw/replicate/snapshot.go:124\nruntime.goexit\n\t/usr/lib/go-1.21/src/runtime/asm_amd64.s:1650"] [path=database.locations.0000000040000.csv]
[2024/01/16 17:22:12.988 +00:00] [INFO] [snapshot.go:128] ["Successfully load snapshot data into data warehouse"] [table=database.locations] [path=database.locations.0000000060000.csv]
Lloyd-Pottiger commented 9 months ago
CREATE TABLE images ( id varbinary(16) NOT NULL, created_at datetime NOT NULL, updated_at datetime NOT NULL, properties text DEFAULT NULL, original_properties text DEFAULT NULL, sync_status varchar(255) DEFAULT 'pending' COMMENT '同步状态', store_origin_id int(11) NOT NULL, PRIMARY KEY (store_origin_id,id) /*T![clustered_index] CLUSTERED */, UNIQUE KEY uk_pi_id (id)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

INSERT INTO `images` 
(`id`, `created_at`, `updated_at`, `properties`, `original_properties`, `sync_status`, `store_origin_id`) 
VALUES 
(UNHEX(REPLACE(UUID(), '-', '')), NOW(), NOW(), '{"color": "red"}', '{"color": "blue"}', 'pending', 1);

@eugen-korentsov could you please try this workload?

eugen-korentsov commented 9 months ago

@Lloyd-Pottiger yes, here is result:

[2024/01/17 07:53:36.512 +00:00] [INFO] [core.go:188] ["Start Export"] [stage=init] [mode=full]
[2024/01/17 07:53:36.526 +00:00] [INFO] [config.go:56] ["TiDB connection established"]
[2024/01/17 07:53:36.528 +00:00] [INFO] [tso.go:21] ["Successfully get current tso"] [tso=0]
[2024/01/17 07:53:36.792 +00:00] [INFO] [connector.go:92] ["create changefeed success"] [changefeed-id=dd4e305f-c836-4760-aef3-c762a2818be5] [replica-config="{\"bdr_mode\":false,\"case_sensitive\":false,\"changefeed_error_stuck_duration\":1800000000000,\"check_gc_safe_point\":true,\"consistent\":{\"encoding_worker_num\":16,\"flush_interval\":2000,\"flush_worker_num\":8,\"level\":\"none\",\"max_log_size\":64,\"meta_flush_interval\":200,\"use_file_backend\":false},\"enable_sync_point\":false,\"filter\":{\"rules\":[\"database.images\"]},\"force_replicate\":false,\"ignore_ineligible_table\":false,\"integrity\":{\"corruption_handle_level\":\"warn\",\"integrity_check_level\":\"none\"},\"memory_quota\":1073741824,\"mounter\":{\"worker_num\":16},\"scheduler\":{\"enable_table_across_nodes\":false,\"region_threshold\":100000,\"write_key_threshold\":0},\"sink\":{\"advance_timeout\":150,\"cloud_storage_config\":{\"file_size\":67108864,\"flush_interval\":\"1m0s\",\"output_column_id\":true},\"csv\":{\"binary_encoding_method\":\"hex\",\"delimiter\":\",\",\"include_commit_ts\":true,\"null\":\"\\\\N\",\"quote\":\"\\\"\"},\"date_separator\":\"day\",\"delete_only_output_handle_key_columns\":false,\"enable_kafka_sink_v2\":false,\"enable_partition_separator\":true,\"encoder_concurrency\":32,\"file_index_width\":20,\"only_output_updated_columns\":false,\"protocol\":\"csv\",\"terminator\":\"\\r\\n\"},\"sql_mode\":\"ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION\",\"sync_point_interval\":600000000000,\"sync_point_retention\":86400000000000}"]
[2024/01/17 07:53:36.900 +00:00] [INFO] [config.go:56] ["TiDB connection established"]
[2024/01/17 07:53:36.917 +00:00] [INFO] [version.go:434] ["detect server version"] [type=TiDB] [version=7.5.0]
[2024/01/17 07:53:36.922 +00:00] [INFO] [dump.go:1446] ["meet some problem while fetching pd addrs. This won't affect dump process"] [error="sql: SELECT * FROM information_schema.cluster_info where type = 'pd';: Error 1227 (42000): Access denied; you need (at least one of) the PROCESS privilege(s) for this operation"]
[2024/01/17 07:53:36.923 +00:00] [WARN] [dump.go:1506] ["If the amount of data to dump is large, criteria: (data more than 60GB or dumped time more than 10 minutes)\nyou'd better adjust the tikv_gc_life_time to avoid export failure due to TiDB GC during the dump process.\nBefore dumping: run sql `update mysql.tidb set VARIABLE_VALUE = '720h' where VARIABLE_NAME = 'tikv_gc_life_time';` in tidb.\nAfter dumping: run sql `update mysql.tidb set VARIABLE_VALUE = '10m' where VARIABLE_NAME = 'tikv_gc_life_time';` in tidb.\n"]
[2024/01/17 07:53:36.961 +00:00] [INFO] [dump.go:79] ["Using snapshot"] [snapshot=447080829266952195]
[2024/01/17 07:53:36.962 +00:00] [INFO] [dump.go:151] ["begin to run Dump"] [conf="{\"s3\":{\"endpoint\":\"\",\"region\":\"\",\"storage-class\":\"\",\"sse\":\"\",\"sse-kms-key-id\":\"\",\"acl\":\"\",\"access-key\":\"\",\"secret-access-key\":\"\",\"session-token\":\"\",\"provider\":\"\",\"force-path-style\":false,\"use-accelerate-endpoint\":false,\"role-arn\":\"\",\"external-id\":\"\",\"object-lock-enabled\":false},\"gcs\":{\"endpoint\":\"\",\"storage-class\":\"\",\"predefined-acl\":\"\",\"credentials-file\":\"\"},\"azblob\":{\"endpoint\":\"\",\"account-name\":\"\",\"account-key\":\"\",\"access-tier\":\"\",\"sas-token\":\"\",\"encryption-scope\":\"\",\"encryption-key\":\"\"},\"SpecifiedTables\":true,\"AllowCleartextPasswords\":false,\"SortByPk\":true,\"NoViews\":true,\"NoSequences\":true,\"NoHeader\":true,\"NoSchemas\":false,\"NoData\":false,\"CompleteInsert\":false,\"TransactionalConsistency\":true,\"EscapeBackslash\":false,\"DumpEmptyDatabase\":true,\"PosAfterConnect\":false,\"CompressType\":0,\"Host\":\"dev-tidb.tidb.svc.cluster.local\",\"Port\":4000,\"Threads\":8,\"User\":\"tidb2dw\",\"Security\":{\"CAPath\":\"\",\"CertPath\":\"\",\"KeyPath\":\"\"},\"LogLevel\":\"\",\"LogFile\":\"\",\"LogFormat\":\"\",\"OutputDirPath\":\"gs://test-tidb2dw/images1-dev/snapshot?credentials-file=%2Fticdc%2Fcred.json\",\"StatusAddr\":\":8281\",\"Snapshot\":\"447080829266952195\",\"Consistency\":\"snapshot\",\"CsvNullValue\":\"\\\\N\",\"SQL\":\"\",\"CsvSeparator\":\",\",\"CsvDelimiter\":\"\\\"\",\"CsvLineTerminator\":\"\\r\\n\",\"Databases\":null,\"Where\":\"\",\"FileType\":\"csv\",\"ServerInfo\":{\"ServerType\":3,\"ServerVersion\":\"7.5.0\",\"HasTiKV\":true},\"Rows\":1,\"ReadTimeout\":0,\"TiDBMemQuotaQuery\":0,\"FileSize\":5368709120,\"StatementSize\":1000000,\"SessionParams\":{\"tidb_snapshot\":\"447080829266952195\"},\"Tables\":{\"database\":[{\"Name\":\"images\",\"AvgRowLength\":0,\"Type\":0}]},\"CollationCompatible\":\"loose\",\"CsvOutputDialect\":3,\"IOTotalBytes\":null,\"Net\":\"\"}"]
[2024/01/17 07:53:36.979 +00:00] [DEBUG] [dump.go:1600] ["no need to build region info because database is not TiDB 3.x"]
[2024/01/17 07:53:37.114 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="meta of dababase 'database'"]
[2024/01/17 07:53:37.114 +00:00] [DEBUG] [dump.go:441] ["start dumping table..."] [database=database] [table=images]
[2024/01/17 07:53:37.115 +00:00] [DEBUG] [writer_util.go:464] ["opened file"] [path=gcs://test-tidb2dw/images1-dev/snapshot/database-schema-create.sql]
[2024/01/17 07:53:37.115 +00:00] [DEBUG] [writer_util.go:126] ["start dumping meta data"] [target=database]
[2024/01/17 07:53:37.116 +00:00] [DEBUG] [writer_util.go:139] ["finish dumping meta data"] [target=database]
[2024/01/17 07:53:37.129 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="meta of table 'database'.'images'"]
[2024/01/17 07:53:37.130 +00:00] [DEBUG] [writer_util.go:464] ["opened file"] [path=gcs://test-tidb2dw/images1-dev/snapshot/database.images-schema.sql]
[2024/01/17 07:53:37.130 +00:00] [DEBUG] [writer_util.go:126] ["start dumping meta data"] [target=database]
[2024/01/17 07:53:37.130 +00:00] [DEBUG] [writer_util.go:139] ["finish dumping meta data"] [target=database]
[2024/01/17 07:53:37.132 +00:00] [DEBUG] [dump.go:878] ["dumping TiDB tables with TABLESAMPLE"] [database=database] [table=images]
[2024/01/17 07:53:37.141 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="data of table 'database'.'images'(0/2)"]
[2024/01/17 07:53:37.141 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="data of table 'database'.'images'(1/2)"]
[2024/01/17 07:53:37.141 +00:00] [DEBUG] [writer.go:200] ["trying to dump table chunk"] [retryTime=1] [db=database] [table=images] [chunkIndex=1] []
[2024/01/17 07:53:37.142 +00:00] [DEBUG] [ir_impl.go:205] ["try to start tableData"] [query="SELECT * FROM `database`.`images` WHERE `store_origin_id`>1 or(`store_origin_id`=1 and `id`>=x'771ac871b50d11eeba480a7b3274cd2a')  ORDER BY `store_origin_id`,`id`"]
[2024/01/17 07:53:37.141 +00:00] [DEBUG] [writer.go:200] ["trying to dump table chunk"] [retryTime=1] [db=database] [table=images] [chunkIndex=0] []
[2024/01/17 07:53:37.143 +00:00] [DEBUG] [ir_impl.go:205] ["try to start tableData"] [query="SELECT * FROM `database`.`images` WHERE `store_origin_id`<1 or(`store_origin_id`=1 and `id`<x'771ac871b50d11eeba480a7b3274cd2a')  ORDER BY `store_origin_id`,`id`"]
[2024/01/17 07:53:37.150 +00:00] [DEBUG] [writer_util.go:498] ["opened file"] [path=gcs://test-tidb2dw/images1-dev/snapshot/database.images.0000000010000.csv]
[2024/01/17 07:53:37.150 +00:00] [DEBUG] [writer_util.go:351] ["finish dumping table(chunk)"] [database=database] [table=images] ["finished rows"=1] ["finished size"=131]
[2024/01/17 07:53:37.150 +00:00] [DEBUG] [writer_util.go:508] ["tear down lazy file writer..."] [path=gcs://test-tidb2dw/images1-dev/snapshot/database.images.0000000010000.csv]
[2024/01/17 07:53:37.150 +00:00] [INFO] [writer.go:272] ["no data written in table chunk"] [database=database] [table=images] [chunkIdx=0]
[2024/01/17 07:53:37.150 +00:00] [DEBUG] [dump.go:369] ["finish dumping table data task"] [database=database] [table=images] [chunkIdx=0]
[2024/01/17 07:53:37.206 +00:00] [DEBUG] [writer.go:256] ["finish dumping table(chunk)"] [database=database] [table=images] [chunkIdx=1] ["total rows"=1]
[2024/01/17 07:53:37.206 +00:00] [DEBUG] [dump.go:369] ["finish dumping table data task"] [database=database] [table=images] [chunkIdx=1]
[2024/01/17 07:53:37.206 +00:00] [INFO] [collector.go:264] ["backup success summary"] [total-ranges=4] [ranges-succeed=4] [ranges-failed=0] [total-take=95.370875ms] [total-kv-size=131B] [average-speed=1.374kB/s] [total-rows=1]
[2024/01/17 07:53:37.206 +00:00] [DEBUG] [status.go:32] ["stopping log progress"]
[2024/01/17 07:53:37.207 +00:00] [DEBUG] [writer_util.go:464] ["opened file"] [path=gcs://test-tidb2dw/images1-dev/snapshot/metadata]
[2024/01/17 07:53:37.257 +00:00] [INFO] [dump.go:145] ["Successfully dumped table from TiDB"] [status="{\"CompletedTables\":1,\"FinishedBytes\":131,\"FinishedRows\":1,\"EstimateTotalRows\":10000,\"TotalTables\":1,\"CurrentSpeedBPS\":366.8913133776387,\"Progress\":\"100.00 %\"}"]
[2024/01/17 07:53:37.258 +00:00] [INFO] [snapshot.go:60] ["Creating replicate session"] [table=database.images] [storage=/images1-dev/snapshot]
[2024/01/17 07:53:37.269 +00:00] [INFO] [config.go:56] ["TiDB connection established"]
[2024/01/17 07:53:37.786 +00:00] [ERROR] [snapshot.go:176] ["Failed to load snapshot"] [table=database.images] [error="Failed to create table: googleapi: Error 400: Primary key column cannot be of type BYTES. Invalid column: id., invalid"] [errorVerbose="googleapi: Error 400: Primary key column cannot be of type BYTES. Invalid column: id., invalid\ngithub.com/pingcap/errors.AddStack\n\t/home/eugene/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20231212100244-799fae176cfb/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/home/eugene/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20231212100244-799fae176cfb/juju_adaptor.go:15\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.runQuery\n\t/home/eugene/work/database/tidb2dw/pkg/bigquerysql/operation.go:17\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.(*BigQueryConnector).CopyTableSchema\n\t/home/eugene/work/database/tidb2dw/pkg/bigquerysql/connector.go:103\ngithub.com/pingcap-inc/tidb2dw/replicate.(*SnapshotReplicateSession).Run\n\t/home/eugene/work/database/tidb2dw/replicate/snapshot.go:88\ngithub.com/pingcap-inc/tidb2dw/replicate.StartReplicateSnapshot\n\t/home/eugene/work/database/tidb2dw/replicate/snapshot.go:175\ngithub.com/pingcap-inc/tidb2dw/cmd.Replicate.func1\n\t/home/eugene/work/database/tidb2dw/cmd/core.go:256\nruntime.goexit\n\t/usr/lib/go-1.21/src/runtime/asm_amd64.s:1650\nFailed to create table"]
Lloyd-Pottiger commented 9 months ago
CREATE TABLE images ( id varcahr(16) NOT NULL, created_at datetime NOT NULL, updated_at datetime NOT NULL, properties text DEFAULT NULL, original_properties text DEFAULT NULL, sync_status varchar(255) DEFAULT 'pending' COMMENT '同步状态', store_origin_id int(11) NOT NULL, PRIMARY KEY (store_origin_id,id) /*T![clustered_index] CLUSTERED */, UNIQUE KEY uk_pi_id (id)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

INSERT INTO `images` 
(`id`, `created_at`, `updated_at`, `properties`, `original_properties`, `sync_status`, `store_origin_id`) 
VALUES 
(UUID(), NOW(), NOW(), '{"color": "red"}', '{"color": "blue"}', 'pending', 1);

remove bytes, please try again @eugen-korentsov

eugen-korentsov commented 9 months ago

@Lloyd-Pottiger error: SQL Error [1064] [42000]: You have an error in your SQL syntax; check the manual that corresponds to your TiDB version for the right syntax to use line 1 column 96 near "varcahr(16) NOT NULL, created_at datetime NOT NULL, updated_at datetime NOT NULL, properties text DEFAULT NULL, original_properties text DEFAULT NULL, sync_status varchar(255) DEFAULT 'pending' COMMENT '同步状态', store_origin_id int(11) NOT NULL, PRIMARY KEY (store_origin_id,id) /*T![clustered_index] CLUSTERED */, UNIQUE KEY uk_pi_id (id)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin"

Lloyd-Pottiger commented 9 months ago
CREATE TABLE images ( id varchar(36) NOT NULL, created_at datetime NOT NULL, updated_at datetime NOT NULL, properties text DEFAULT NULL, original_properties text DEFAULT NULL, sync_status varchar(255) DEFAULT 'pending' COMMENT '同步状态', store_origin_id int(11) NOT NULL, PRIMARY KEY (store_origin_id,id) /*T![clustered_index] CLUSTERED */, UNIQUE KEY uk_pi_id (id)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

INSERT INTO `images` 
(`id`, `created_at`, `updated_at`, `properties`, `original_properties`, `sync_status`, `store_origin_id`) 
VALUES 
(UUID(), NOW(), NOW(), '{"color": "red"}', '{"color": "blue"}', 'pending', 1);

sorry, typo...

eugen-korentsov commented 9 months ago

@Lloyd-Pottiger result:

[2024/01/17 08:07:41.394 +00:00] [INFO] [core.go:188] ["Start Export"] [stage=init] [mode=full]
[2024/01/17 08:07:41.408 +00:00] [INFO] [config.go:56] ["TiDB connection established"]
[2024/01/17 08:07:41.409 +00:00] [INFO] [tso.go:21] ["Successfully get current tso"] [tso=0]
[2024/01/17 08:07:41.614 +00:00] [INFO] [connector.go:92] ["create changefeed success"] [changefeed-id=31cfc918-1e08-42ce-bb1c-4985d1cebe6f] [replica-config="{\"bdr_mode\":false,\"case_sensitive\":false,\"changefeed_error_stuck_duration\":1800000000000,\"check_gc_safe_point\":true,\"consistent\":{\"encoding_worker_num\":16,\"flush_interval\":2000,\"flush_worker_num\":8,\"level\":\"none\",\"max_log_size\":64,\"meta_flush_interval\":200,\"use_file_backend\":false},\"enable_sync_point\":false,\"filter\":{\"rules\":[\"database.images\"]},\"force_replicate\":false,\"ignore_ineligible_table\":false,\"integrity\":{\"corruption_handle_level\":\"warn\",\"integrity_check_level\":\"none\"},\"memory_quota\":1073741824,\"mounter\":{\"worker_num\":16},\"scheduler\":{\"enable_table_across_nodes\":false,\"region_threshold\":100000,\"write_key_threshold\":0},\"sink\":{\"advance_timeout\":150,\"cloud_storage_config\":{\"file_size\":67108864,\"flush_interval\":\"1m0s\",\"output_column_id\":true},\"csv\":{\"binary_encoding_method\":\"hex\",\"delimiter\":\",\",\"include_commit_ts\":true,\"null\":\"\\\\N\",\"quote\":\"\\\"\"},\"date_separator\":\"day\",\"delete_only_output_handle_key_columns\":false,\"enable_kafka_sink_v2\":false,\"enable_partition_separator\":true,\"encoder_concurrency\":32,\"file_index_width\":20,\"only_output_updated_columns\":false,\"protocol\":\"csv\",\"terminator\":\"\\r\\n\"},\"sql_mode\":\"ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION\",\"sync_point_interval\":600000000000,\"sync_point_retention\":86400000000000}"]
[2024/01/17 08:07:41.726 +00:00] [INFO] [config.go:56] ["TiDB connection established"]
[2024/01/17 08:07:41.748 +00:00] [INFO] [version.go:434] ["detect server version"] [type=TiDB] [version=7.5.0]
[2024/01/17 08:07:41.750 +00:00] [INFO] [dump.go:1446] ["meet some problem while fetching pd addrs. This won't affect dump process"] [error="sql: SELECT * FROM information_schema.cluster_info where type = 'pd';: Error 1227 (42000): Access denied; you need (at least one of) the PROCESS privilege(s) for this operation"]
[2024/01/17 08:07:41.751 +00:00] [WARN] [dump.go:1506] ["If the amount of data to dump is large, criteria: (data more than 60GB or dumped time more than 10 minutes)\nyou'd better adjust the tikv_gc_life_time to avoid export failure due to TiDB GC during the dump process.\nBefore dumping: run sql `update mysql.tidb set VARIABLE_VALUE = '720h' where VARIABLE_NAME = 'tikv_gc_life_time';` in tidb.\nAfter dumping: run sql `update mysql.tidb set VARIABLE_VALUE = '10m' where VARIABLE_NAME = 'tikv_gc_life_time';` in tidb.\n"]
[2024/01/17 08:07:41.780 +00:00] [INFO] [dump.go:79] ["Using snapshot"] [snapshot=447081050726203400]
[2024/01/17 08:07:41.780 +00:00] [INFO] [dump.go:151] ["begin to run Dump"] [conf="{\"s3\":{\"endpoint\":\"\",\"region\":\"\",\"storage-class\":\"\",\"sse\":\"\",\"sse-kms-key-id\":\"\",\"acl\":\"\",\"access-key\":\"\",\"secret-access-key\":\"\",\"session-token\":\"\",\"provider\":\"\",\"force-path-style\":false,\"use-accelerate-endpoint\":false,\"role-arn\":\"\",\"external-id\":\"\",\"object-lock-enabled\":false},\"gcs\":{\"endpoint\":\"\",\"storage-class\":\"\",\"predefined-acl\":\"\",\"credentials-file\":\"\"},\"azblob\":{\"endpoint\":\"\",\"account-name\":\"\",\"account-key\":\"\",\"access-tier\":\"\",\"sas-token\":\"\",\"encryption-scope\":\"\",\"encryption-key\":\"\"},\"SpecifiedTables\":true,\"AllowCleartextPasswords\":false,\"SortByPk\":true,\"NoViews\":true,\"NoSequences\":true,\"NoHeader\":true,\"NoSchemas\":false,\"NoData\":false,\"CompleteInsert\":false,\"TransactionalConsistency\":true,\"EscapeBackslash\":false,\"DumpEmptyDatabase\":true,\"PosAfterConnect\":false,\"CompressType\":0,\"Host\":\"dev-tidb.tidb.svc.cluster.local\",\"Port\":4000,\"Threads\":8,\"User\":\"tidb2dw\",\"Security\":{\"CAPath\":\"\",\"CertPath\":\"\",\"KeyPath\":\"\"},\"LogLevel\":\"\",\"LogFile\":\"\",\"LogFormat\":\"\",\"OutputDirPath\":\"gs://test-tidb2dw/images2-dev/snapshot?credentials-file=%2Fticdc%2Fcred.json\",\"StatusAddr\":\":8281\",\"Snapshot\":\"447081050726203400\",\"Consistency\":\"snapshot\",\"CsvNullValue\":\"\\\\N\",\"SQL\":\"\",\"CsvSeparator\":\",\",\"CsvDelimiter\":\"\\\"\",\"CsvLineTerminator\":\"\\r\\n\",\"Databases\":null,\"Where\":\"\",\"FileType\":\"csv\",\"ServerInfo\":{\"ServerType\":3,\"ServerVersion\":\"7.5.0\",\"HasTiKV\":true},\"Rows\":1,\"ReadTimeout\":0,\"TiDBMemQuotaQuery\":0,\"FileSize\":5368709120,\"StatementSize\":1000000,\"SessionParams\":{\"tidb_snapshot\":\"447081050726203400\"},\"Tables\":{\"database\":[{\"Name\":\"images\",\"AvgRowLength\":0,\"Type\":0}]},\"CollationCompatible\":\"loose\",\"CsvOutputDialect\":3,\"IOTotalBytes\":null,\"Net\":\"\"}"]
[2024/01/17 08:07:41.799 +00:00] [DEBUG] [dump.go:1600] ["no need to build region info because database is not TiDB 3.x"]
[2024/01/17 08:07:41.941 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="meta of dababase 'database'"]
[2024/01/17 08:07:41.942 +00:00] [DEBUG] [dump.go:441] ["start dumping table..."] [database=database] [table=images]
[2024/01/17 08:07:41.942 +00:00] [DEBUG] [writer_util.go:464] ["opened file"] [path=gcs://test-tidb2dw/images2-dev/snapshot/database-schema-create.sql]
[2024/01/17 08:07:41.942 +00:00] [DEBUG] [writer_util.go:126] ["start dumping meta data"] [target=database]
[2024/01/17 08:07:41.942 +00:00] [DEBUG] [writer_util.go:139] ["finish dumping meta data"] [target=database]
[2024/01/17 08:07:41.960 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="meta of table 'database'.'images'"]
[2024/01/17 08:07:41.960 +00:00] [DEBUG] [writer_util.go:464] ["opened file"] [path=gcs://test-tidb2dw/images2-dev/snapshot/database.images-schema.sql]
[2024/01/17 08:07:41.960 +00:00] [DEBUG] [writer_util.go:126] ["start dumping meta data"] [target=database]
[2024/01/17 08:07:41.961 +00:00] [DEBUG] [writer_util.go:139] ["finish dumping meta data"] [target=database]
[2024/01/17 08:07:41.962 +00:00] [DEBUG] [dump.go:878] ["dumping TiDB tables with TABLESAMPLE"] [database=database] [table=images]
[2024/01/17 08:07:41.968 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="data of table 'database'.'images'(0/2)"]
[2024/01/17 08:07:41.969 +00:00] [DEBUG] [writer.go:200] ["trying to dump table chunk"] [retryTime=1] [db=database] [table=images] [chunkIndex=0] []
[2024/01/17 08:07:41.969 +00:00] [DEBUG] [dump.go:825] ["send task to writer"] [task="data of table 'database'.'images'(1/2)"]
[2024/01/17 08:07:41.969 +00:00] [DEBUG] [ir_impl.go:205] ["try to start tableData"] [query="SELECT * FROM `database`.`images` WHERE `store_origin_id`<1 or(`store_origin_id`=1 and `id`<'76376382-b50f-11ee-ba48-0a7b3274cd2a')  ORDER BY `store_origin_id`,`id`"]
[2024/01/17 08:07:41.969 +00:00] [DEBUG] [writer.go:200] ["trying to dump table chunk"] [retryTime=1] [db=database] [table=images] [chunkIndex=1] []
[2024/01/17 08:07:41.969 +00:00] [DEBUG] [ir_impl.go:205] ["try to start tableData"] [query="SELECT * FROM `database`.`images` WHERE `store_origin_id`>1 or(`store_origin_id`=1 and `id`>='76376382-b50f-11ee-ba48-0a7b3274cd2a')  ORDER BY `store_origin_id`,`id`"]
[2024/01/17 08:07:41.971 +00:00] [INFO] [writer.go:272] ["no data written in table chunk"] [database=database] [table=images] [chunkIdx=0]
[2024/01/17 08:07:41.972 +00:00] [DEBUG] [dump.go:369] ["finish dumping table data task"] [database=database] [table=images] [chunkIdx=0]
[2024/01/17 08:07:41.974 +00:00] [DEBUG] [writer_util.go:498] ["opened file"] [path=gcs://test-tidb2dw/images2-dev/snapshot/database.images.0000000010000.csv]
[2024/01/17 08:07:41.974 +00:00] [DEBUG] [writer_util.go:351] ["finish dumping table(chunk)"] [database=database] [table=images] ["finished rows"=1] ["finished size"=143]
[2024/01/17 08:07:41.974 +00:00] [DEBUG] [writer_util.go:508] ["tear down lazy file writer..."] [path=gcs://test-tidb2dw/images2-dev/snapshot/database.images.0000000010000.csv]
[2024/01/17 08:07:42.013 +00:00] [DEBUG] [writer.go:256] ["finish dumping table(chunk)"] [database=database] [table=images] [chunkIdx=1] ["total rows"=1]
[2024/01/17 08:07:42.013 +00:00] [DEBUG] [dump.go:369] ["finish dumping table data task"] [database=database] [table=images] [chunkIdx=1]
[2024/01/17 08:07:42.013 +00:00] [INFO] [collector.go:264] ["backup success summary"] [total-ranges=4] [ranges-succeed=4] [ranges-failed=0] [total-take=73.618973ms] [total-kv-size=143B] [average-speed=1.942kB/s] [total-rows=1]
[2024/01/17 08:07:42.014 +00:00] [DEBUG] [status.go:32] ["stopping log progress"]
[2024/01/17 08:07:42.014 +00:00] [DEBUG] [writer_util.go:464] ["opened file"] [path=gcs://test-tidb2dw/images2-dev/snapshot/metadata]
[2024/01/17 08:07:42.064 +00:00] [INFO] [dump.go:145] ["Successfully dumped table from TiDB"] [status="{\"CompletedTables\":1,\"FinishedBytes\":143,\"FinishedRows\":1,\"EstimateTotalRows\":10000,\"TotalTables\":1,\"CurrentSpeedBPS\":422.72143311150325,\"Progress\":\"100.00 %\"}"]
[2024/01/17 08:07:42.064 +00:00] [INFO] [snapshot.go:60] ["Creating replicate session"] [table=database.images] [storage=/images2-dev/snapshot]
[2024/01/17 08:07:42.076 +00:00] [INFO] [config.go:56] ["TiDB connection established"]
[2024/01/17 08:07:42.691 +00:00] [INFO] [snapshot.go:91] ["Successfully copy table schema"] [table=database.images]
[2024/01/17 08:07:42.732 +00:00] [INFO] [snapshot.go:123] ["Loading snapshot data into data warehouse"] [table=database.images] [path=database.images.0000000010000.csv]
[2024/01/17 08:07:46.026 +00:00] [INFO] [snapshot.go:128] ["Successfully load snapshot data into data warehouse"] [table=database.images] [path=database.images.0000000010000.csv]
[2024/01/17 08:07:46.026 +00:00] [INFO] [snapshot.go:148] ["Successfully load all snapshot data into data warehouse"] [table=database.images] [size=143] [cost=3.33484954s]
[2024/01/17 08:07:46.093 +00:00] [INFO] [snapshot.go:156] ["Successfully upload loadinfo"] [table=database.images] [loadinfo="Copy to data warehouse start time: 2024-01-17T08:07:42Z\nCopy to data warehouse end time: 2024-01-17T08:07:46Z\n"]
[2024/01/17 08:07:58.224 +00:00] [INFO] [increment.go:337] ["no new files found since last round"] [table=database.images]
Lloyd-Pottiger commented 9 months ago

no idea...

Lloyd-Pottiger commented 9 months ago

@eugen-korentsov https://github.com/pingcap-inc/tidb2dw/pull/87 this may fix, please try again with latest commit.

eugen-korentsov commented 9 months ago

@Lloyd-Pottiger Yes, works fine with last build. Thank you.