Closed olljanat closed 1 year ago
Interestingly enough i had errors like this when packaging for hetzner as well that seem to have resolved themselves by restarting/reinstalling and "starting from scratch". Can you try that as well? Seems like there is some state in docker csi that is not cleaned up.
Afaik the nodes identify themselves via the identity service. These node ids should then be passed to this Controller just fine iirc.
Good tip but does not look to be about that.
Okay, I guess you have tested this already, I just wanted to give some insight. I was already hunting through the code to debug things and I got it working somehow.
Another thing that definitely has caused me issues is not having the plugin with the controller/node installed on all managers.
I mean, I tested that immediately after your tip by removing /var/lib/docker
and rebuilding plugin. Currently using one node only so that is not also the case.
Anyway as far I can see at least https://github.com/moby/swarmkit/blob/2ad26e5531a20bcbdc099bf3bdbabd01c1406e84/manager/csi/plugin.go#L200-L219 should have check that if nodeID
is empty request should not even been send to CSI plugin but still need to figure out why that is the case.
I agree with your idea of having swarmkit checking this. This will remove some guess work
@s4ke btw. Does those plugins which with you have tested have PUBLISH_UNPUBLISH_VOLUME
capability? Looks that at least those which I have used earlier don't have that so PublishVolume
function does not actually do anything: https://github.com/moby/swarmkit/blob/2ad26e5531a20bcbdc099bf3bdbabd01c1406e84/manager/csi/plugin.go#L203-L206
DigitalOcean plugin have that capability and that logic in PublishVolume
function is broken.
The plugin I worked on has these capabilities:
controller: https://github.com/hetznercloud/csi-driver/blob/main/driver/controller.go#L305
node: https://github.com/hetznercloud/csi-driver/blob/main/driver/node.go#L159
does this help?
Draft version of adding DigitalOcean CSI packaging example to here.
Currently fails on https://github.com/digitalocean/csi-digitalocean/blob/v4.5.0/driver/controller.go#L267-L269 is which is most probably bug in Swarmkit because based on CSI spec NodeID is mandatory parameter on PublishVolume requests https://github.com/container-storage-interface/spec/blob/release-1.7/csi.proto#L730-L732
Reason why Swarmkit does not handle this correctly most probably is that most of the CSI drivers takes NodeID as parameter like this https://github.com/olljanat/csi-plugins-for-docker-swarm/blob/f05fbab91c32a29dcf14f93d107501ab499af97a/csi-driver-smb/entrypoint.sh#L8 which is not the case in DigitalOcean implementation.