Closed eugen-korentsov closed 7 months ago
@Lloyd-Pottiger sorry again, maybe there is another quick fix available ?
I tried it with 8, encountered the same issue. Also tried with 4, but still had the problem. However, with 2 looks like works fine , processed 150 csvs from 1360 till now without errors, (though slower, it's okay for us now)
UPD: was rateLimit issue for 6 csv files out of 1300+, restarted with 1 thread.
After starting tidb2dw process after 1 day of full sync, it started uploading of incremental parts:
[2024/01/19 11:59:16.691 +00:00] [INFO] [connector.go:145] ["Successfully merge file"] [file=database/events/447121134133706885/2024-01-19/CDC00000000000000000108.csv]
[2024/01/19 11:59:22.222 +00:00] [INFO] [connector.go:145] ["Successfully merge file"] [file=database/events/447121134133706885/2024-01-19/CDC00000000000000000109.csv]
[2024/01/19 11:59:28.973 +00:00] [INFO] [connector.go:145] ["Successfully merge file"] [file=database/events/447121134133706885/2024-01-19/CDC00000000000000000110.csv]
[2024/01/19 11:59:35.438 +00:00] [INFO] [connector.go:145] ["Successfully merge file"] [file=database/events/447121134133706885/2024-01-19/CDC00000000000000000111.csv]
[2024/01/19 11:59:43.688 +00:00] [INFO] [connector.go:145] ["Successfully merge file"] [file=database/events/447121134133706885/2024-01-19/CDC00000000000000000112.csv]
[2024/01/19 11:59:44.472 +00:00] [ERROR] [increment.go:413] ["error occurred while running increment replicate session"] [table=database.events] [error="Bigquery load snapshot job completed with error: {Location: \"load_job_per_table.long\"; Message: \"Quota exceeded: Your table exceeded quota for imports or query appends per table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas\"; Reason: \"quotaExceeded\"}"] [errorVerbose="Bigquery load snapshot job completed with error: {Location: \"load_job_per_table.long\"; Message: \"Quota exceeded: Your table exceeded quota for imports or query appends per table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas\"; Reason: \"quotaExceeded\"}\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.loadGCSFileToBigQuery\n\t/home/eugene/work/database/tidb2dw/pkg/bigquerysql/operation.go:52\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.(*BigQueryConnector).LoadIncrement\n\t/home/eugene/work/database/tidb2dw/pkg/bigquerysql/connector.go:130\ngithub.com/pingcap-inc/tidb2dw/replicate.(*IncrementReplicateSession).syncExecDMLEvents\n\t/home/eugene/work/database/tidb2dw/replicate/increment.go:268\ngithub.com/pingcap-inc/tidb2dw/replicate.(*IncrementReplicateSession).handleNewFiles\n\t/home/eugene/work/database/tidb2dw/replicate/increment.go:364\ngithub.com/pingcap-inc/tidb2dw/replicate.(*IncrementReplicateSession).Run\n\t/home/eugene/work/database/tidb2dw/replicate/increment.go:386\ngithub.com/pingcap-inc/tidb2dw/replicate.StartReplicateIncrement\n\t/home/eugene/work/database/tidb2dw/replicate/increment.go:412\ngithub.com/pingcap-inc/tidb2dw/cmd.Replicate.func1\n\t/home/eugene/work/database/tidb2dw/cmd/core.go:279\nruntime.goexit\n\t/usr/lib/go-1.21/src/runtime/asm_amd64.s:1650"]
It seems CDC has generated more than 1500 csv's during that time and we faced bigquery Table modifications limit - https://cloud.google.com/bigquery/quotas#standard_tables During a full dump, this error can occur if the table is too large. I tried using the --cdc.file-size 536870912 and --cdc.flush-interval 5m options in tidb2dw, but it seems they have no effect.
We can't use it with big table, dump generates 7500 csv's, then it starts uploading files one by one and after 1500 csv's limit error. Also it is very slow, I tried to upload all csv's manually with bq load and it works very fast with wildcard, like:
# bq load --project_id test --allow_quoted_newlines --allow_jagged_rows --null_marker='\N' --source_format=CSV tidb_prod.test gs://test-tidb2dw/test-prod/snapshot/*.csv
Waiting on bqjob_r6eb3c7a55a2f1312_0000018d4a15ad62_1 ... (65s) Current status: DONE
65 seconds to load 300+ gb, 171+ mln rows database, and no limit issues.
Hey @eugen-korentsov, thanks for trying tidb2dw and find so many issues.
We can simply fix this issue by replacing the following code
with
if err := sess.DataWarehousePool.LoadSnapshot(sess.SourceTable, fmt.Sprintf("%s.%s.*", sess.SourceDatabase, sess.SourceTable)); err != nil {
sess.logger.Error("Failed to load snapshot data into data warehouse", zap.Error(err))
}
But it is not a suitable way for all warehouses, so I will not open a PR to main branch. We need more time to test different warehouses.
Thanks for your feedback again~
@Lloyd-Pottiger thank you, code above works fine.
fixed by a17f4cc898b67b9c82c55244be3efb752f03bd6a
During loading of big table (40 mln rows) bigquery respond with Rate Limit error.
[2024/01/18 09:33:11.501 +00:00] [INFO] [snapshot.go:123] ["Loading snapshot data into data warehouse"] [table=database.estates] [path=database.estates.0000000310000.csv] [2024/01/18 09:33:11.527 +00:00] [ERROR] [snapshot.go:125] ["Failed to load snapshot data into data warehouse"] [table=database.estates] [error="Bigquery load snapshot job completed with error: {Location: \"table.write\"; Message: \"Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas\"; Reason: \"rateLimitExceeded\"}"] [errorVerbose="Bigquery load snapshot job completed with error: {Location: \"table.write\"; Message: \"Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas\"; Reason: \"rateLimitExceeded\"}\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.loadGCSFileToBigQuery\n\t/home/eugene/work/database/tidb2dw/pkg/bigquerysql/operation.go:52\ngithub.com/pingcap-inc/tidb2dw/pkg/bigquerysql.(*BigQueryConnector).LoadSnapshot\n\t/home/eugene/work/database/tidb2dw/pkg/bigquerysql/connector.go:112\ngithub.com/pingcap-inc/tidb2dw/replicate.(*SnapshotReplicateSession).Run.func2.1\n\t/home/eugene/work/database/tidb2dw/replicate/snapshot.go:124\nruntime.goexit\n\t/usr/lib/go-1.21/src/runtime/asm_amd64.s:1650"] [path=database.estates.0000000280000.csv]