pingcap / tiup

A component manager for TiDB
https://tiup.io
Apache License 2.0
424 stars 312 forks source link

cluster: Improve cluster restart messaging #2442

Closed zph closed 4 months ago

zph commented 4 months ago

What problem does this PR solve?

Cluster restart messaging indicates to user that the:

This message is confusing when the command run only requests a single node restart, especially monitoring nodes or ones in an HA deployment.

What is changed and how it works?

Commit changes the messaging so that users performing full cluster restart retain the same message as before. But users only restarting selected nodes or roles will have that replaced with a more detailed warning message:

Cluster functionality related to nodes: %s and roles: % will be unavailable

Check List

Tests

Code changes

Side effects

Related changes

Release notes:

NONE
ti-chi-bot[bot] commented 4 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign bb7133 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/pingcap/tiup/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
CLAassistant commented 4 months ago

CLA assistant check
All committers have signed the CLA.

ti-chi-bot[bot] commented 4 months ago

Welcome @zph! It looks like this is your first PR to pingcap/tiup 🎉

ti-chi-bot[bot] commented 4 months ago

[LGTM Timeline notifier]

Timeline:

zph commented 4 months ago

I'm away from my computer tonight, but if there are persistent test failures I can retry my manual regression

zph commented 4 months ago

Looking at the failing test, I'm unable to see enough detail to assess if it's my changes or unrelated test issues.

The part closest to pd receiving a signal is a connection error:

[2024/07/19 07:09:00.517 +00:00] [WARN] [server.go:530] ["Server.onConn handshake"] [conn=5506538949855674773] [error="read tcp 127.0.0.1:4000->127.0.0.1:55138: read: connection reset by peer"] ["remote addr"=127.0.0.1:55138]
[2024/07/19 07:09:01.300 +00:00] [INFO] [manager.go:263] ["revoke session"] ["owner info"="[log-backup] /tidb/br-stream/owner ownerManager 0491a53d-ac93-459c-b17c-1adcb6c564e1"] [error="rpc error: code = Canceled desc = grpc: the client connection is closing"]
...
check detail log from: /home/runner/work/tiup/tiup/go/src/github.com/pingcap/tiup/tests/tiup-playground/_tmp/home/data/test_play/tidb-0/tidb.log
pd quit: signal: killed

Please let me know if I should look into this test failure further or how I can run it locally to debug with more logs if it seems related to my change.