Open Smityz opened 7 months ago
I noticed that there are many hard-coded parameters. After modifying some of them, I found that the latency decreased from 8 seconds to 4 seconds. Would it be possible to make these parameters configurable? @pingyu
I think yes. Which parameters, and why they affect the latency in your case ?
https://github.com/tikv/tikv/blob/v6.5.8/components/causal_ts/src/tso.rs#L70
/// Maximum available interval of TSO cache.
/// It means the duration that TSO we cache would be available despite failure
/// of PD. The longer of the value can provide better "High-Availability"
/// against PD failure, but more overhead of `TsoBatchList` & pressure to TSO
/// service.
pub(crate) const DEFAULT_TSO_BATCH_ALLOC_AHEAD_BUFFER_MS: u64 = 500;
This parameter in TiKV can cause a timestamp discrepancy with physical time, resulting in incorrect calculations, but seems no real negative effect.
https://github.com/tikv/migration/blob/main/cdc/pkg/config/kvclient.go#L27
// the safe interval to move backward resolved ts
ResolvedTsSafeInterval time.Duration `toml:"resolved-ts-safe-interval" json:"resolved-ts-safe-interval"`
This parameter have a real negative effect, but it seems related to a server side bug.
I also tried other parameters, like
sink default concurrency
and
flush ticker
in
https://github.com/tikv/migration/blob/main/cdc/cdc/sink/tikv.go
, but no use
This parameter in TiKV can cause a timestamp discrepancy with physical time, resulting in incorrect calculations, but seems no real negative effect.
This parameter is configurable, See https://docs.pingcap.com/tidb/v6.5/tikv-configuration-file#alloc-ahead-buffer-new-in-v640.
It's indeed an issue, the cached TSO should not affect the calculation of replication gap. But currently I don't find a good way to fix it. (Maybe carry the information about the cache in resolved ts message ?)
This parameter have a real negative effect, but it seems related to a server side bug.
Emmm... I think the issue has been fixed by https://github.com/tikv/tikv/pull/13520. The parameter is a work around (#196) but we forget to remove it.
I will remove it and do some tests.
Great! Thank you master @pingyu
But I think hard code parameters in https://github.com/tikv/migration/blob/main/cdc/cdc/sink/tikv.go
is still an issue. Perhaps we could consolidate these parameters into a single file as constant variables
You are right, the ResolvedTsSafeInterval
is a way to workaround a bug, the perfect fix depends on this PR. If you can be sure that the tikv version you are using is greater than or equal to 6.5, you can set ResolvedTsSafeInterval
to 0.
Great! Thank you master @pingyu But I think hard code parameters in
https://github.com/tikv/migration/blob/main/cdc/cdc/sink/tikv.go
is still an issue. Perhaps we could consolidate these parameters into a single file as constant variables
En ? You mean these parameters should be constant or configurable ?
Great! Thank you master @pingyu But I think hard code parameters in
https://github.com/tikv/migration/blob/main/cdc/cdc/sink/tikv.go
is still an issue. Perhaps we could consolidate these parameters into a single file as constant variablesEn ? You mean there parameters should be constant or configurable ?
There are many hard-coded numbers in the code now, which is not very friendly for some optimization operations because developers often do not realize the impact these numbers may have. If we can give these numbers some names, it will make it easier for developers to optimize them instead of reading through all the code. When we find parameters that really have a huge impact on performance, we can consider making them configurable. Before that, we can consider giving them constant names and corresponding comments to facilitate retrieval. For example: https://github.com/tikv/migration/blob/main/cdc/cdc/sink/tikv.go#L47 https://github.com/tikv/migration/blob/main/cdc/cdc/kv/client.go#L1164
Great! Thank you master @pingyu But I think hard code parameters in
https://github.com/tikv/migration/blob/main/cdc/cdc/sink/tikv.go
is still an issue. Perhaps we could consolidate these parameters into a single file as constant variablesEn ? You mean there parameters should be constant or configurable ?
There are many hard-coded numbers in the code now, which is not very friendly for some optimization operations because developers often do not realize the impact these numbers may have. If we can give these numbers some names, it will make it easier for developers to optimize them instead of reading through all the code. When we find parameters that really have a huge impact on performance, we can consider making them configurable. Before that, we can consider giving them constant names and corresponding comments to facilitate retrieval. For example: https://github.com/tikv/migration/blob/main/cdc/cdc/sink/tikv.go#L47 https://github.com/tikv/migration/blob/main/cdc/cdc/kv/client.go#L1164
Get it.
BTW, the concurrency is configurable by concurrency
in sink URL of changefeed. See https://github.com/tikv/migration/blob/1b33b2a92709b631cfab2b7d546ae0c230083353/cdc/cdc/sink/tikv.go#L99.
I used this command to monitor my cdc job, and found the latency is large than I expected.
replication_gap
is big butsink_gap
is small, could this be due to slow data pulling? How can I increase its speed?