Open nicolasscott opened 1 year ago
we see this issue to, this happens when nomad fail to GC an volume: and currently we had to manually call nomad volume deregister
to get back to a working state
Hi @nicolasscott and thanks for the report; I'll get it added to our backlog.
It looks like we're running into a similar issue in 1.6.1 with the following message on the leader:
2023-08-21T10:39:03.249Z [ERROR] nomad.fsm: CSIVolumeClaim failed: error="volume max claims reached"
2023-08-21T10:39:03.249Z [ERROR] nomad.csi_volume: csi raft apply failed: error="volume max claims reached" method=claim
The claim is never released for allocations that have been garbage collected.
I am seeing this on the leader in my cluster every five minutes:
Sep 29 21:40:05 nomad03 nomad[663]: 2023-09-29T21:40:05.115+0200 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use" method=delete
Sep 29 21:45:05 nomad03 nomad[663]: 2023-09-29T21:45:05.115+0200 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use" method=delete
Sep 29 21:50:05 nomad03 nomad[663]: 2023-09-29T21:50:05.117+0200 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use" method=delete
Sep 29 21:55:05 nomad03 nomad[663]: 2023-09-29T21:55:05.114+0200 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use" method=delete
Sep 29 22:00:05 nomad03 nomad[663]: 2023-09-29T22:00:05.114+0200 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use" method=delete
@tgross Since you are the resident CSI expert -- any information I can get you from that node?
I also experience the same issue after we start using Nomad CSI with Ceph:
Feb 19 17:11:17 REDACTED nomad[39920]: 2024-02-19T17:11:17.183+0200 [ERROR] nomad.csi_plugin: csi raft apply failed: error="plugin in use" method=delete
Nomad version
v1.5.3
Operating system and Environment details
Ubuntu 20.04
Issue
When nomad does a garbage collection, we see an error:
I've tried with several versions of the plugin, so it appears that the issue is with nomad.
Reproduction steps
Use AWS EBS CSI plugin with following job specs:
Register an AWS volume and do a
nomad system gc
Expected Result
No errors
Actual Result
Log entry such as:
Job file (if appropriate)
Nomad Server logs (if appropriate)
Nomad Client logs (if appropriate)