tikv / pd

Placement driver for TiKV
Apache License 2.0
1.06k stars 724 forks source link

`clock offset` is not clear enough #8806

Open rleungx opened 2 weeks ago

rleungx commented 2 weeks ago

Enhancement Task

func (t *timestampOracle) UpdateTimestamp() error {
    prevPhysical, prevLogical := t.getTSO()

    now := time.Now()
    t.metrics.saveEvent.Inc()

    jetLag := typeutil.SubRealTimeByWallClock(now, prevPhysical)
    if jetLag > 3*t.updatePhysicalInterval && jetLag > jetLagWarningThreshold {
        log.Warn("clock offset",
            logutil.CondUint32("keyspace-group-id", t.keyspaceGroupID, t.keyspaceGroupID > 0),
            zap.Duration("jet-lag", jetLag),
            zap.Time("prev-physical", prevPhysical),
            zap.Time("now", now),
            zap.Duration("update-physical-interval", t.updatePhysicalInterval))
        t.metrics.slowSaveEvent.Inc()
    }
        ...

From the above code, if the current system time is much later than the previous physical time or runtime issue, the log will be printed. But clock offset could be one of the reasons. So here, we'd better use a clearer log message that is less confusing.

okJiang commented 1 week ago

clock offset could be one of the reasons

What other reasons need to be explained? PD restart? Leader transfer?

Or is it sufficient to just mention that there hasn't been a physical time update for a while, which may have caused a clock offset?

@JmPotato @rleungx

JmPotato commented 1 week ago

clock offset could be one of the reasons

What other reasons need to be explained? PD restart? Leader transfer?

Or is it sufficient to just mention that there hasn't been a physical time update for a while, which may have caused a clock drift?

@JmPotato @rleungx

If the etcd suffers from a slow IO performance, the TSO updating may fail to advance the physical part, which will also cause the "clock offset".