hetznercloud / csi-driver

Kubernetes Container Storage Interface driver for Hetzner Cloud Volumes
MIT License
635 stars 103 forks source link

Stuck on ContainerCreating: implement setting to force detaching? #486

Closed erikschul closed 7 months ago

erikschul commented 1 year ago

I sometimes rebuild the Kubernetes control plane and worker nodes during experimentation, which leaves some volumes attached. When a new container is created, and the volume is already attached to another node (according to Hetzner control panel), the Hetzner CSI Driver fails to detach it. After manual intervention, pods are still stuck in "ContainerCreating" and have to be manually deleted. This is also probably a bug in Hetzner CSI Driver.

Suggestion:

  1. allow driver to force detaching; perhaps allow this as a configuration setting (ConfigMap) alpha feature so it won't break existing users
  2. fix bug so pod is correctly crashed if driver fails to attach

Also, the control panel forces the user to create a volume, then manually detach it. It is not possible to create a volume without being attached.

erikschul commented 1 year ago

Possibly related: https://github.com/hetznercloud/csi-driver/issues/411

joliver commented 11 months ago

I'm seeing this as well. Essentially the CSI controller gets out of sync somehow. It then tries to attach a volume that was never cleanly detached from a previous server instance.

I'd like to update the code such that, if Attach fails because it's already attached, the plugin then issues Detach instruction.

If you're open to this, I'd be happy to submit a pull request.

erikschul commented 11 months ago

@apricote

apricote commented 11 months ago

I would prefer to find the root cause for why this ("CSI controller gets out of sync somehow") happens, rather than trying to implement a potentially destructive workaround.

411 was definitely an issue on the API side, possible related to deleting Servers & detaching Volumes at the same time.


Do you know some way to replicate this behaviour?

I sometimes rebuild the Kubernetes control plane and worker nodes during experimentation, which leaves some volumes attached.

What exactly do you mean by "rebuild"? Do you delete all the servers & volumes, just provision a new OS on them, just uninstall & reinstall Kubernetes?

If you delete the server and the volume still shows up as attached to it, that is definitely a bug in the API that we should take a closer look at.


Also, the control panel forces the user to create a volume, then manually detach it. It is not possible to create a volume without being attached.

You can create volumes without attachment through the API, CLI, Terraform or even the csi-driver. The Cloud Console is oriented towards users that are not too technical, and for them it makes sense to always attach the volume to a server. If you want, I can forward this feedback to the UX team.

erikschul commented 11 months ago

root cause: I remember seeing some code in the driver along the lines of "if this is already attached then exit".

destructive: Well, K8s is asking for the volume to be attached, so I assume that means the PV has been correctly claimed. The exception is if you have separate clusters potentially using the same volumes, and they compete to attach the volume. So I'd argue that it isn't destructive to just attach it. And the software (e.g. database etc.) should know how to handle a disk failure (detachment) correctly anyway.

rebuild: I mean to build a cluster, attach the volume, then delete the cluster, and set it up again. I use a GitOps approach, so rebuilding a cluster from scratch takes just a few minutes. That means all etcd state is lost, and rebuilt. This causes the driver to refuse to attach the volume, because it's already attached. The workaround is to destroy the cluster, then detach all volumes, then rebuild. I'm not deleting the server (that would cause double billings because Hetzner's minimum is hourly), instead I rebuild it using the API:

https://docs.hetzner.cloud/#server-actions-rebuild-a-server-from-an-image

apricote commented 11 months ago

root cause: I remember seeing some code in the driver along the lines of "if this is already attached then exit".

Yea, this is the behaviour defined by the CSI specification (Look for "Volume published to another node"). The Container Orchestrator (Kubernetes here) is responsible for unmounting the Volume from the current Node. Perhaps they do not do this because that Node is unknown to the cluster or they dont have the VolumeAttachment resource?

https://github.com/hetznercloud/csi-driver/blob/5da7a14e9e8dfc6716ede0b72b5aaeac6b8f1306/driver/controller.go#L166-L167

I will try to reproduce the issue to find out what is going wrong and why Kubernetes does not try to detach the volume from the existing node.

erikschul commented 11 months ago

@apricote I see your point. In my case, Kubernetes cannot unmount because the etcd state was wiped, so the volume doesn't exist yet in etcd. Perhaps the "is volume attached" check should be internal to Kubernetes, rather than the Hetzner API? I.e. check against existing PV/PVC ? And if none exist, then detach on Hetzner API?

joliver commented 11 months ago

We're running various CSI drivers from other providers (DigitalOcean, UpCloud, Vultr, AWS, GCP, etc.). The only one that gets stuck because the volume is already attached is the Hetzner CSI driver.

By going into the Hetzner Cloud web interface / control panel and detaching the drive from the previous assignment, everything begins functioning as expected.

erikschul commented 11 months ago

@apricote I just realized that you obviously want compatibility with non-Kubernetes CSI use cases (e.g. Nomad). Perhaps you can try to see how DigitalOcean handles it? I.e. when they raise "FAILED_PRECONDITION".

apricote commented 11 months ago

DO also returns the codes.FailedPrecondition if the volume is already mounted:

  attachedID := 0
  for _, id := range vol.DropletIDs {
    attachedID = id
    if id == dropletID {
      log.Info("volume is already attached")
      return &csi.ControllerPublishVolumeResponse{
        PublishContext: map[string]string{
          d.publishInfoVolumeName: vol.Name,
        },
      }, nil
    }
  }

  // droplet is attached to a different node, return an error
  if attachedID != 0 {
    return nil, status.Errorf(codes.FailedPrecondition,
      "volume %q is attached to the wrong droplet (%d), detach the volume to fix it",
      req.VolumeId, attachedID)
  }

A quick look into kubernetes-csi/external-attacher (the sidecar calling ControllerPublishVolume) tells me that they have no handling whatsoever for FailedPrecondition. This does not strictly violate the CSI Specification, but goes against the recovery recommendation for this case. The word Caller is not defined in the spec, but I assume from context that this means the CO/Container Orchestrator/Kubernetes.

Caller SHOULD ensure the specified volume is not published at any other node before retrying with exponential back off.

I will dive into the exact calls being made in the coming days.

erikschul commented 11 months ago

If this is the industry best practice, then I'm happy to close the issue, since I can just detach volumes manually when rebuilding.

But it sounds like @joliver has a different issue, where it gets stuck during normal operations? I'm assuming the finalizer will detach when deleting a PV/PVC, but maybe if the detach fails (failed API call or other reasons), it can end up in a bad state? This could especially be relevant when using ArgoCD which can delete and recreate resources often.

joliver commented 11 months ago

My issues comes from when the driver doesn't cleanly unmount such as a sudden failure in a control plane node and/or a deployment stopping abruptly. I can repeat it through a brute force methods. Essentially the controller thinks that the volume is unmounted but it's actually attached such that when the instruction comes from the control plane to mount to another device, we end up where the incorrectly mounted volume cannot be remounted to another location.

One thought might be to handle this on the Hetzner internal side such that, if a mount instruction comes, the Hetzner system detaches the volume (implicitly) and re-attaches to another device.

github-actions[bot] commented 8 months ago

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.