openebs / zfs-localpv

Dynamically provision Stateful Persistent Node-Local Volumes & Filesystems for Kubernetes that is integrated with a backend ZFS data storage stack.
https://openebs.io
Apache License 2.0
413 stars 100 forks source link

Volume From Snapshot Fails With Custom Node Ids #541

Closed jnels124 closed 1 month ago

jnels124 commented 3 months ago

What steps did you take and what happened: [A clear and concise description of what the bug is, and what commands you ran.]

  1. Label node(s) with unique value for openebs.io/nodeid
  2. Deploy zfs components
  3. Create a pvc with zfs storage class
  4. Identify volume created successfully
  5. Create a snapshot and ensure the appropriate zfs and k8s resources exist
  6. Create a new pvc defined to be created from snapshot
  7. Notice zfs volume created and after 10 minutes goes to pending
  8. Logs will indicate unable to find node with nodeid value from the owning node

What did you expect to happen: It should utilize the correct ids when identifying the node and create the volume from snapshot successfully.

Anything else you would like to add: We pre-label (before starting zfs components) our nodes with unique identifiers for openebs.io/nodeid and these identifiers differ from the node name.

With this configuration, I am able to create new ZFS volumes without issue. However, when you attempt to create a zfs volume from a snapshot definition it fails.

In the logs I can see failed to get the node {VALUE_OF_OPENEBS_NODEID} which highlights the issue. This GetNodeID method in volume.go only works when it is passed the nodename.

In the case of creating a new volume GetNodeID is called with the nodename but when it is from a volume clone or a snapshot it gets called with the value of openebs.io/nodeid. This is because this block here in driver/controller.go in the CreateVolume method:

if contentSource != nil && contentSource.GetSnapshot() != nil {
        snapshotID := contentSource.GetSnapshot().GetSnapshotId()

        selected, err = CreateSnapClone(ctx, req, snapshotID)
    } else if contentSource != nil && contentSource.GetVolume() != nil {
        srcVol := contentSource.GetVolume().GetVolumeId()
        selected, err = CreateVolClone(ctx, req, srcVol)
    } else {
        selected, err = CreateZFSVolume(ctx, req)
    }

CreateSnapClone --> returns spec.OwnerNodeId CreateVolClone --> returns spec.OwnerNodeId CreateZFSVolume --> returns nodename loop variable

Environment:

Abhinandan-Purkait commented 3 months ago

@jnels124 Thanks for creating a separate issue. Would you be interested in taking this up or maybe you can create a design proposal for the same so that the community can review and take this forward. Thanks

jnels124 commented 2 months ago

@Abhinandan-Purkait should be able to get around to implementing the fix for this in the next few weeks

Abhinandan-Purkait commented 2 months ago

@Abhinandan-Purkait should be able to get around to implementing the fix for this in the next few weeks

That's great!