pingcap / tiup

A component manager for TiDB
https://tiup.io
Apache License 2.0
418 stars 309 forks source link

The upgrade error finally prompts that the cluster upgrade is successful #1220

Open together-wang opened 3 years ago

together-wang commented 3 years ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?

    • tiup cluster upgrade lwt v5.0.0-pre
    • During the upgrade process, restart the machine. The connection was refused, but the upgrade was successful in the end
  2. What did you expect to see?

    • The upgrade is interrupted and an error is reported to exit the upgrade
  3. What did you see instead?

    [tidb@node4126 ~]$ tiup cluster upgrade lwt v5.0.0-pre
    Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.3.5/tiup-cluster upgrade lwt v5.0.0-pre
    This operation will upgrade tidb v4.0.11 cluster lwt to v5.0.0-pre.
    Do you want to continue? [y/N]:(default=N) y
    Upgrading cluster...
    + [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/cluster/clusters/lwt/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/cluster/clusters/lwt/ssh/id_rsa.pub
    + [Parallel] - UserSSH: user=tidb, host=172.16.4.126
    + [Parallel] - UserSSH: user=tidb, host=172.16.4.192
    + [Parallel] - UserSSH: user=tidb, host=172.16.5.145
    + [Parallel] - UserSSH: user=tidb, host=172.16.4.192
    + [Parallel] - UserSSH: user=tidb, host=172.16.5.145
    + [Parallel] - UserSSH: user=tidb, host=172.16.5.190
    + [Parallel] - UserSSH: user=tidb, host=172.16.5.190
    + [Parallel] - UserSSH: user=tidb, host=172.16.4.126
    + [Parallel] - UserSSH: user=tidb, host=172.16.4.126
    + [ Serial ] - Download: component=grafana, version=v5.0.0-pre, os=linux, arch=amd64
    + [ Serial ] - Download: component=pd, version=v5.0.0-pre, os=linux, arch=amd64
    + [ Serial ] - Download: component=tidb, version=v5.0.0-pre, os=linux, arch=amd64
    + [ Serial ] - Download: component=tikv, version=v5.0.0-pre, os=linux, arch=amd64
    + [ Serial ] - Download: component=prometheus, version=v5.0.0-pre, os=linux, arch=amd64
    + [ Serial ] - BackupComponent: component=grafana, currentVersion=v4.0.11, remote=172.16.4.126:/tidb-deploy/grafana-3105
    + [ Serial ] - BackupComponent: component=pd, currentVersion=v4.0.11, remote=172.16.4.192:/tidb-deploy/pd-2494
    + [ Serial ] - BackupComponent: component=tikv, currentVersion=v4.0.11, remote=172.16.5.145:/tidb-deploy/tikv-20275
    + [ Serial ] - BackupComponent: component=tikv, currentVersion=v4.0.11, remote=172.16.4.192:/tidb-deploy/tikv-20275
    + [ Serial ] - BackupComponent: component=pd, currentVersion=v4.0.11, remote=172.16.5.190:/tidb-deploy/pd-2494
    + [ Serial ] - BackupComponent: component=pd, currentVersion=v4.0.11, remote=172.16.5.145:/tidb-deploy/pd-2494
    + [ Serial ] - BackupComponent: component=tikv, currentVersion=v4.0.11, remote=172.16.5.190:/tidb-deploy/tikv-20275
    + [ Serial ] - BackupComponent: component=tidb, currentVersion=v4.0.11, remote=172.16.4.126:/tidb-deploy/tidb-4200
    + [ Serial ] - BackupComponent: component=prometheus, currentVersion=v4.0.11, remote=172.16.4.126:/tidb-deploy/prometheus-9305
    + [ Serial ] - CopyComponent: component=pd, version=v5.0.0-pre, remote=172.16.5.145:/tidb-deploy/pd-2494 os=linux, arch=amd64
    + [ Serial ] - CopyComponent: component=tikv, version=v5.0.0-pre, remote=172.16.5.190:/tidb-deploy/tikv-20275 os=linux, arch=amd64
    + [ Serial ] - CopyComponent: component=tikv, version=v5.0.0-pre, remote=172.16.5.145:/tidb-deploy/tikv-20275 os=linux, arch=amd64
    + [ Serial ] - CopyComponent: component=pd, version=v5.0.0-pre, remote=172.16.4.192:/tidb-deploy/pd-2494 os=linux, arch=amd64
    + [ Serial ] - CopyComponent: component=grafana, version=v5.0.0-pre, remote=172.16.4.126:/tidb-deploy/grafana-3105 os=linux, arch=amd64
    + [ Serial ] - CopyComponent: component=tidb, version=v5.0.0-pre, remote=172.16.4.126:/tidb-deploy/tidb-4200 os=linux, arch=amd64
    + [ Serial ] - CopyComponent: component=tikv, version=v5.0.0-pre, remote=172.16.4.192:/tidb-deploy/tikv-20275 os=linux, arch=amd64
    + [ Serial ] - CopyComponent: component=prometheus, version=v5.0.0-pre, remote=172.16.4.126:/tidb-deploy/prometheus-9305 os=linux, arch=amd64
    + [ Serial ] - CopyComponent: component=pd, version=v5.0.0-pre, remote=172.16.5.190:/tidb-deploy/pd-2494 os=linux, arch=amd64
    + [ Serial ] - InitConfig: cluster=lwt, user=tidb, host=172.16.4.126, path=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache/prometheus-9305.service, deploy_dir=/tidb-deploy/prometheus-9305, data_dir=[/tidb-data/prometheus-9305], log_dir=/tidb-deploy/prometheus-9305/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache
    + [ Serial ] - InitConfig: cluster=lwt, user=tidb, host=172.16.5.145, path=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache/pd-2494.service, deploy_dir=/tidb-deploy/pd-2494, data_dir=[/tidb-data/pd-2494], log_dir=/tidb-deploy/pd-2494/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache
    + [ Serial ] - InitConfig: cluster=lwt, user=tidb, host=172.16.4.192, path=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache/pd-2494.service, deploy_dir=/tidb-deploy/pd-2494, data_dir=[/tidb-data/pd-2494], log_dir=/tidb-deploy/pd-2494/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache
    + [ Serial ] - InitConfig: cluster=lwt, user=tidb, host=172.16.5.190, path=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache/pd-2494.service, deploy_dir=/tidb-deploy/pd-2494, data_dir=[/tidb-data/pd-2494], log_dir=/tidb-deploy/pd-2494/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache
    + [ Serial ] - InitConfig: cluster=lwt, user=tidb, host=172.16.4.126, path=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache/grafana-3105.service, deploy_dir=/tidb-deploy/grafana-3105, data_dir=[], log_dir=/tidb-deploy/grafana-3105/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache
    + [ Serial ] - InitConfig: cluster=lwt, user=tidb, host=172.16.4.192, path=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache/tikv-20275.service, deploy_dir=/tidb-deploy/tikv-20275, data_dir=[/tidb-data/tikv-20275], log_dir=/tidb-deploy/tikv-20275/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache
    + [ Serial ] - InitConfig: cluster=lwt, user=tidb, host=172.16.5.145, path=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache/tikv-20275.service, deploy_dir=/tidb-deploy/tikv-20275, data_dir=[/tidb-data/tikv-20275], log_dir=/tidb-deploy/tikv-20275/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache
    + [ Serial ] - InitConfig: cluster=lwt, user=tidb, host=172.16.5.190, path=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache/tikv-20275.service, deploy_dir=/tidb-deploy/tikv-20275, data_dir=[/tidb-data/tikv-20275], log_dir=/tidb-deploy/tikv-20275/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache
    + [ Serial ] - InitConfig: cluster=lwt, user=tidb, host=172.16.4.126, path=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache/tidb-4200.service, deploy_dir=/tidb-deploy/tidb-4200, data_dir=[], log_dir=/tidb-deploy/tidb-4200/log, cache_dir=/home/tidb/.tiup/storage/cluster/clusters/lwt/config-cache
    + [ Serial ] - UpgradeCluster
    Upgrading component pd
        Restarting instance 172.16.5.145
        Restart 172.16.5.145 success
        Restarting instance 172.16.4.192
        Restart 172.16.4.192 success
        Restarting instance 172.16.5.190
        Restart 172.16.5.190 success
    Upgrading component tikv
    failed counting leader on 172.16.5.145:20275 (status addr http://172.16.5.145:20276/metrics), executing GET request for URL "http://172.16.5.145:20276/metrics" failed: Get "http://172.16.5.145:20276/metrics": dial tcp 172.16.5.145:20276: connect: connection refused
    Upgraded cluster `lwt` successfully
    [tidb@node4126 ~]$ tiup cluster display lwt           
    Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.3.5/tiup-cluster display lwt
    Cluster type:       tidb
    Cluster name:       lwt
    Cluster version:    v5.0.0-pre
    SSH type:           builtin
    Dashboard URL:      http://172.16.4.192:2494/dashboard
    ID                  Role        Host          Ports        OS/Arch       Status   Data Dir                    Deploy Dir
    --                  ----        ----          -----        -------       ------   --------                    ----------
    172.16.4.126:3105   grafana     172.16.4.126  3105         linux/x86_64  Up       -                           /tidb-deploy/grafana-3105
    172.16.4.192:2494   pd          172.16.4.192  2494/2495    linux/x86_64  Up|L|UI  /tidb-data/pd-2494          /tidb-deploy/pd-2494
    172.16.5.145:2494   pd          172.16.5.145  2494/2495    linux/x86_64  Up       /tidb-data/pd-2494          /tidb-deploy/pd-2494
    172.16.5.190:2494   pd          172.16.5.190  2494/2495    linux/x86_64  Up       /tidb-data/pd-2494          /tidb-deploy/pd-2494
    172.16.4.126:9305   prometheus  172.16.4.126  9305         linux/x86_64  Up       /tidb-data/prometheus-9305  /tidb-deploy/prometheus-9305
    172.16.4.126:4200   tidb        172.16.4.126  4200/10070   linux/x86_64  Up       -                           /tidb-deploy/tidb-4200
    172.16.4.192:20275  tikv        172.16.4.192  20275/20276  linux/x86_64  Up       /tidb-data/tikv-20275       /tidb-deploy/tikv-20275
    172.16.5.145:20275  tikv        172.16.5.145  20275/20276  linux/x86_64  Up       /tidb-data/tikv-20275       /tidb-deploy/tikv-20275
    172.16.5.190:20275  tikv        172.16.5.190  20275/20276  linux/x86_64  Up       /tidb-data/tikv-20275       /tidb-deploy/tikv-20275
    Total nodes: 9

image

  1. What version of TiUP are you using (tiup --version)?
    [tidb@node4126 ~]$ tiup --version
    tiup version 1.3.5 tiup
    Go Version: devel +084b07d6f6 Wed Feb 24 05:23:32 2021 +0000
    Git Ref: release-1.4
    GitHash: f9b0a7d2
    [tidb@node4126 ~]$ tiup cluster --version
    tiup version 1.3.5 tiup
    Go Version: devel +084b07d6f6 Wed Feb 24 05:23:32 2021 +0000
    Git Ref: release-1.4
    GitHash: f9b0a7d2
lucklove commented 3 years ago

Sorry that it's by design at current because at this time. The failure on evicting leader is not a critical problem for upgrading (It just cause a jitter)

glkappe commented 3 years ago

@lucklove 所以此时是否可以继续通过 tiup-cluster replay 将未正确升级的 tidb-server 升级到 v5 版本?

lucklove commented 3 years ago

@glkappe 这个 case 是正确升级了的,只是升级过程中有性能抖动

glkappe commented 3 years ago

image

但是看到 display 和 client 中两个表示版本的结果都不同哦~

lucklove commented 3 years ago

@glkappe 好像是的。。。