longhorn / longhorn

Cloud-Native distributed storage built on and for Kubernetes
https://longhorn.io
Apache License 2.0
5.78k stars 575 forks source link

[BUG] [v1.5.4-rc2] V2 volume perform engine upgrade when concurrent-automatic-engine-upgrade-per-node-limit > 0 #7930

Open chriscchien opened 4 months ago

chriscchien commented 4 months ago

Describe the bug

Upgrade from v1.5.3 to v1.5.x-head, V2 volume will perform engine upgrade when concurrent-automatic-engine-upgrade-per-node-limit > 0

Did not see this happen when upgrade from v1.5.3 to v1.6.0

longhorn-manager

time="2024-02-15T09:50:31Z" level=error msg="Failed to run engine live upgrade" func="controller.(*EngineController).syncEngine" file="engine_controller.go:323" controller=longhorn-engine engine=pvc-f6524606-4251-4181-bad0-8a07c4e43534-e-0 error="failed to live upgrade image for pvc-f6524606-4251-4181-bad0-8a07c4e43534-e-0: proxyServer=10.42.1.13:8501 destination=10.42.1.13:20006: failed to get server version: rpc error: code = Unknown desc = failed to get version detail: rpc error: code = Unavailable desc = connection error: desc = \"error reading server preface: http2: frame too large\"" node=ip-172-31-36-118

instance-manager

[longhorn-instance-manager] time="2024-02-15T09:44:33Z" level=info msg="Started new process manager update watch" func="process.(*Manager).ProcessWatch" file="process_manager.go:359"
[longhorn-instance-manager] time="2024-02-15T09:44:33Z" level=info msg="Started new SPDK service engine update watch" func="spdk.(*Server).EngineWatch" file="server.go:638"
[longhorn-instance-manager] time="2024-02-15T09:44:33Z" level=info msg="Started new SPDK service replica update watch" func="spdk.(*Server).ReplicaWatch" file="server.go:363"
[2024-02-15 09:46:53.801294] tcp.c:2104:nvmf_tcp_pdu_ch_handle: *ERROR*: The TCP/IP connection is not negotiated
[2024-02-15 09:47:01.227900] tcp.c:2104:nvmf_tcp_pdu_ch_handle: *ERROR*: The TCP/IP connection is not negotiated
[2024-02-15 09:47:23.800482] tcp.c:1498:nvmf_tcp_qpair_handle_timeout: *ERROR*: No pdu coming for tqpair=0x22967a0 within 30 seconds
[2024-02-15 09:47:31.217207] tcp.c:2104:nvmf_tcp_pdu_ch_handle: *ERROR*: The TCP/IP connection is not negotiated

To Reproduce

  1. Deploy v1.5.3, enable v2 data engine
  2. Have v2 volume with data
  3. Upgrade to v1.5.x-head (v1.5.4-rc2)
  4. Set concurrent-automatic-engine-upgrade-per-node-limit > 0
  5. Wait few seconds, V2 volume will perform engine upgrade automatically

Expected behavior

Do not perform v2 engine upgrade

Support bundle for troubleshooting

supportbundle_3daa3ce3-0c63-4d76-9f71-c4de4d0c6f46_2024-02-15T09-49-34Z.zip

Environment

Additional context

N/A

innobead commented 4 months ago

We did not support engine upgrades for v2 in 1.5, but this should be supported in 1.7.