longhorn / longhorn

Cloud-Native distributed storage built on and for Kubernetes
https://longhorn.io
Apache License 2.0
6.1k stars 597 forks source link

[BUG] Volume has multiple instance managers #8374

Closed codot-fr closed 1 week ago

codot-fr commented 6 months ago

Describe the bug

A volume is stuck with multiple instance manager registred. It is marked as healthy but not ready for workload.

image

Environment

mantissahz commented 6 months ago

Hi @codot-fr,

It is marked as healthy but not ready for workload.

Do you have any other volume ready with the workload? Could you let me know if you upgraded the harvester with this volume attached to the workload?

Could you provide the support bundle for investigating?

This might be the issue https://github.com/longhorn/longhorn/issues/6642

codot-fr commented 6 months ago

I'm not sure I understand when you ask about other volume ready with the workload

I've upgraded harvester a while ago, it was fine. It appeared after a complete harvester cluster reboot. It's the only volume that has this issue.

Support bundle attached.

Thanks !

Edit : I can't upload the zip file.

mantissahz commented 6 months ago

I'm not sure I understand when you ask about other volume ready with the workload

Any other volumes attached to the VMs and the VM is running well?

I've upgraded harvester a while ago, it was fine. It appeared after a complete harvester cluster reboot. It's the only volume that has this issue.

Was this volume created after the upgrade and how did you reboot the harvester cluster?

Edit : I can't upload the zip file.

Could you send it to longhorn-support-bundle@suse.com?

codot-fr commented 6 months ago

I'm not sure I understand when you ask about other volume ready with the workload

Any other volumes attached to the VMs and the VM is running well?

The VM won't start:

AttachVolume.Attach failed for volume "pvc-3e779637-4ea1-4469-a304-4fd88c040c2a" : rpc error: code = Aborted desc = volume pvc-3e779637-4ea1-4469-a304-4fd88c040c2a is not ready for workloads

I've upgraded harvester a while ago, it was fine. It appeared after a complete harvester cluster reboot. It's the only volume that has this issue.

Was this volume created after the upgrade and how did you reboot the harvester cluster?

No, this volume (and attached VM) is quite old. Harvester cluster has been completly shut down and then restarted because of planned electrical maintenance.

Edit : I can't upload the zip file.

Could you send it to longhorn-support-bundle@suse.com?

Sent!

derekbit commented 6 months ago

cc @ejweber @PhanLe1010

derekbit commented 6 months ago

Ref: https://github.com/longhorn/longhorn/issues/8197#issuecomment-2005116550

derekbit commented 6 months ago

cc @Vicente-Cheng as well

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 1 week ago

This issue was closed because it has been stalled for 5 days with no activity.