pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.31k stars 5.85k forks source link

lightning should cache the result of `columnAPI.Cols()` to improve performance #56705

Open lance6716 opened 1 month ago

lance6716 commented 1 month ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

capture a CPU profile when lightning encodes KV

image

https://github.com/pingcap/tidb/blob/ff2feb6653846b0decd8b1d7cc0a665e128ccf26/br/pkg/lightning/backend/kv/sql2kv.go#L491-L497

some implementations of columnAPI.Cols() just return the slice, like before https://github.com/pingcap/tidb/pull/50062 or after https://github.com/pingcap/tidb/pull/53798. The other will filter and clone the slice.

2. What did you expect to see? (Required)

stable performance for different versions of lightning

3. What did you see instead (Required)

the duration doubles

4. What is your TiDB version? (Required)

at least v6.5.10

princejha95 commented 4 weeks ago

@lance6716 can i pick this up ?

kennedy8312 commented 4 weeks ago

/type regression

princejha95 commented 2 weeks ago

I am using tidb as backend and while trying to create a task, i am getting following error:

["encode kv data and write failed"] [table=test.posts] [engineNumber=0] [takeTime=316.715µs] [error="file 'test.posts-schema.sql' with unknown source type 'table-schema'"]

Do i need to change the source type ?

lance6716 commented 2 weeks ago

please follow this doc https://docs.pingcap.com/tidb/stable/tidb-lightning-data-source to prepare the source data. Or you can take a look at lightning integration tests https://github.com/pingcap/tidb/tree/master/lightning/tests

princejha95 commented 2 weeks ago

@lance6716 I am running the code from latest master branch and i am using tidb as backend. I see that the duration of encoding sql -> kv is around 2 mins.

encode_logs

Am i missing something here ?

Payload i am passing while creating task:

[lightning] table-concurrency = 1 index-concurrency = 1 region-concurrency = 1 io-concurrency = 1 check-requirements = true meta-schema-name = "meta-test"

[tidb] host = "127.0.0.1" port = 4000 user = "root" status-port = 42483

lance6716 commented 2 weeks ago

Maybe your disk or CPU resource is not enough. I see the restore file completed line encodeDur is too high for a size=63 task.

princejha95 commented 2 weeks ago

Yeah.. it takes ~29 secs which is unusually high to encode a task of size 63 bytes. Now that i have reproduced the issue, i will start working on the fix.

lance6716 commented 2 weeks ago

Yeah.. it takes ~29 secs which is unusually high to encode a task of size 63 bytes. Now that i have reproduced the issue, i will start working on the fix.

I think the time cost is not normal, maybe check your develop environment first? I'm a bit worried that it will affect your choice of fixing the issue. You can run a golang CPU profiling https://pkg.go.dev/runtime/pprof for that function to see why it's slow. See the example in that link, simply add pprof.StartCPUProfile(f) and pprof.StopCPUProfile() to cover the function that need to be checked.

princejha95 commented 2 weeks ago

Sure. I will run the CPU profiling for the Encode function and verify what's the issue.