Open sugaf1204 opened 2 months ago
This issue is currently awaiting triage.
If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
/sig node /sig storage
/assign
/remove-sig node
I think storage is the correct label for this. Not sure what node should do in this case.
This is fixed in https://github.com/kubernetes/kubernetes/pull/127021 @xing-yang i think we can close this issue.
What happened?
Broken cephfs volume exists, but kubeletvolume* metrics of it is not output.
As kubelet_volume_stats_health_status_abnormal metrics is not output, I can't detect volume health.
What did you expect to happen?
The kubelet_volume_stats_health_status_abnormal should be output for broken volumes.
How can we reproduce it (as minimally and precisely as possible)?
Anything else we need to know?
In ceph-csi, when an unhelthy volume is met, only the volumeCondition is returned as NodeGetVolumeStatsResponse.
In kubernetes, only those with Usage in NodeGetVolumeStatsResponse are accepted.
I think that it should accept only VolumeCondition without Usage.
https://github.com/ceph/ceph-csi/blob/e6540989a52212cf9b66672b4aa8fde19d037be6/internal/cephfs/nodeserver.go#L787
https://github.com/kubernetes/kubernetes/blob/2a1d4172e22abb6759b3d2ad21bb09a04eef596d/pkg/volume/csi/csi_client.go#L611-L638
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)