kubernetes-sigs / cluster-api-provider-nested

Cluster API Provider for Nested Clusters
Apache License 2.0
298 stars 65 forks source link

Sync PVC's Status.Phase #351

Closed yuanchen8911 closed 1 year ago

yuanchen8911 commented 1 year ago

What this PR does / why we need it:

The current syncer doesn't sync PVC.Status.Phase in a tenant cluster. We have observed that a super cluster's PV and PVC are not Bound while the the tenant cluster's PV and PVC are Bound, which might be caused by bug/race condition in the underlying controllers. It should never happen.

It can be reproduced by deleting a bound PVC in a tenant cluster, recreating the PVC and bound it again. The resulting PV and PVC in the tenant cluster will show Bound and the PV in the super cluster is Released and the PVC is Pending.

The PR updates a tenant cluster's PVC Status.Phase with the super cluster's Status.Phase if the tenant cluster PVC is Bound and the super cluster PVC is not Bound, e.g., Pending, Lost. The other conditions are still handled and reconciled by the PV controller.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #

Fei-Guo commented 1 year ago

"the PV in the super cluster is Released", what does release mean?

Quoted from k8s doc: "When a user is done with their volume, they can delete the PVC objects from the API that allows reclamation of the resource. The reclaim policy for a PersistentVolume tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be Retained, Recycled, or Deleted"

Fei-Guo commented 1 year ago

I'd rather double check how PV status is changed during your experiment. Note that tenant pv controller handles all pvc/pv state changes. We should be really careful about asking syncer to join the interaction.

yuanchen8911 commented 1 year ago

/retest

yuanchen8911 commented 1 year ago

"the PV in the super cluster is Released", what does release mean?

Quoted from k8s doc: "When a user is done with their volume, they can delete the PVC objects from the API that allows reclamation of the resource. The reclaim policy for a PersistentVolume tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be Retained, Recycled, or Deleted"

Released means a PV that was bound to a PVC is no longer bound to a PVC and the Reclaim policy is Retain. In this case, the PVC's claimRef still has the PVC info. A use case is PersistentVolume on local storage. This is just an example. The point is the super cluster and the tenant clusters PV and PVC's Status can be different for some reason.

In our real case, a super cluster's PVC Status.Phase was Lost, but the tenant cluster's PVC status was still Bound because the tenant cluster's PV claimRef was not empty and the PV controller in the tenant cluster solely relied on the PV's claimRef to reconcile PVC's status, which is not sufficient, in my opinion.

If everything works perfectly, we should never see vPVC is bound while pPVC is not bound. Unfortunately, we did see such cases and was able to reproduce it too. If the case never happens again, the change will be a no-op. So it should be a safe change.

A question I do have is whether we should reconcile a PV's Status.Phase as well.

@christopherhein ?

yuanchen8911 commented 1 year ago

I'd rather double check how PV status is changed during your experiment. Note that tenant pv controller handles all pvc/pv state changes. We should be really careful about asking syncer to join the interaction.

Please see my previous comment https://github.com/kubernetes-sigs/cluster-api-provider-nested/pull/351#issuecomment-1681170224. The PV controller, which sets PVC Status.Phase and VolumeName based on the PV's claimRef field is insufficient in a virtual cluster setup and can result in inconsistent status between the tenant cluster and super cluster.

yuanchen8911 commented 1 year ago

/retest-required

yuanchen8911 commented 1 year ago

No idea why pull-cluster-api-provider-nested-test failed with many unrelated errors.

Fei-Guo commented 1 year ago

I was thinking the claimRef which contains a UUID should be different from the newly created PVC. I never expected PV phase can change from "Released" to "Bound".

Fei-Guo commented 1 year ago

What is the k8s version of your tenant controlplane? btw.

yuanchen8911 commented 1 year ago

I was thinking the claimRef which contains a UUID should be different from the newly created PVC. I never expected PV phase can change from "Released" to "Bound".

The following is a real case. We have not figured out why the super cluster PVC was marked as Lost while the PV is Bound with the correct claimRef.

Tenant cluster: vPV (with claimRef): Bound,  vPVC (with volumeName): Bound
Super cluster: pPV (with claimRef): Bound,   pPVC (without volumeName): Lost  

The Released case can be reproduced by deleting a bound PVC, recreating the PVC and removing the PV's claimRef and adding volumeName to the new created PVC. The tenant's PV Status.Phase will be Bound->Released -> Available -> Bound. The PVC is Bound->gone->Pending->Bound. The super cluster's PV will remain Released and the PVC is Lost.

Tenant PV

NAME        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS          CLAIM                  STORAGECLASS    
local-pv    128Gi                          RWO            Retain                    Bound             default/pv-test      local-storage   

NAME        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS          CLAIM                  STORAGECLASS    
local-pv    128Gi                          RWO            Retain                     Released       default/pvc-test   local-storage          

NAME        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS          CLAIM                  STORAGECLASS    
local-pv    128Gi                          RWO            Retain                    Available                                         local-storage   

NAME        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS          CLAIM                  STORAGECLASS    
local-pv    128Gi                          RWO            Retain                    Bound             default/pv-test      local-storage   

Tenant PVC

NAME       STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS    
pvc-test    Bound                                                                            local-storage

NAME       STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS  
pvc-test    Pending                                                                             local-storage

NAME       STATUS   VOLUME              CAPACITY   ACCESS MODES   STORAGECLASS    
pvc-test   Bound    local-pv                   128Gi                     RWO            local-storage  
Fei-Guo commented 1 year ago

Thanks for the clarification. This makes much more sense. I didn't think about this workflow while I was validating whether pvc/pv works before. I think "removing the PV's claimRef" has not been populated down to pPV if I recall. We only have uws for pv, no dws for pv. If we have populated removing claimRef down to pPv, do you think the problem will be gone?

yuanchen8911 commented 1 year ago

Thanks for the clarification. This makes much more sense. I didn't think about this workflow while I was validating whether pvc/pv works before. I think "removing the PV's claimRef" has not been populated down to pPV if I recall. We only have uws for pv, no dws for pv. If we have populated removing claimRef down to pPv, do you think the problem will be gone?

Fei, in both cases above, the PVs in the super cluster had claimRef, it's their volumeName in the PVCs that were missing. So a DWS claimRef may not help. Unless we populate the PVC's volumeName downward, which is something we may want to avoid.

The PR does not intend to automatically resolve the issue, as this would necessitate substantial effort. Instead, the objective is to address situations where statuses are inconsistent, particularly when a tenant cluster shows a normal condition (Bound), while the underlying super cluster does not. In such cases, we will adjust the tenant cluster's status to 'not Bound'. This will allow users/administrators to investigate and fix the issue offline. Under no circumstances should there be a scenario where the tenant cluster's PVC is bound while the super cluster is not. This represents the primary goal of the PR. I hope it makes sense.

it's a great discussion. Thanks!

Fei-Guo commented 1 year ago

"The PR does not intend to automatically resolve the issue," I agree this is a subtle problem and you are trying to provide mitigation. In this case, would you please add a feature gate at least for now so this would not cause any side effect for other users?

yuanchen8911 commented 1 year ago

"The PR does not intend to automatically resolve the issue," I agree this is a subtle problem and you are trying to provide mitigation. In this case, would you please add a feature gate at least for now so this would not cause any side effect for other users?

Good idea! I've added a feature gate SyncTenantPVCStatusPhase .

    // SyncTenantPVCStatusPhase is an experimental feature that enables the syncer to
    // update the Status.Phase of a tenant cluster's PVC if it is Bound,
    // but the corresponding PVC in the super cluster is not Bound, e.g., Lost.
    // Although rare, this situation can arise due to potential bugs and race conditions.
    // This feature allows users to perform separate investigation and resolution.
    SyncTenantPVCStatusPhase = "SyncTenantPVCStatusPhase"
Fei-Guo commented 1 year ago

Can you please fix the lint error (may not due to your change though) and the test failure?

yuanchen8911 commented 1 year ago

Can you please fix the lint error (may not due to your change though) and the test failure?

Fixed it.

Fei-Guo commented 1 year ago

The latest commit does not seem to be the correct one (no feature gate). Can you double check?

yuanchen8911 commented 1 year ago

The latest commit does not seem to be the correct one (no feature gate). Can you double check?

Thanks for catching it. My bad. I checked in a wrong version.

yuanchen8911 commented 1 year ago

/retest-required

yuanchen8911 commented 1 year ago

/retest

yuanchen8911 commented 1 year ago

Some strange errors. I'm looking into it.

yuanchen8911 commented 1 year ago

@Fei-Guo , @christopherhein , all tests passed. PTAL. Thanks!

Fei-Guo commented 1 year ago

/lgtm

christopherhein commented 1 year ago

/lgtm /approve

🎉🎉

k8s-ci-robot commented 1 year ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: christopherhein, yuanchen8911

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[virtualcluster/OWNERS](https://github.com/kubernetes-sigs/cluster-api-provider-nested/blob/main/virtualcluster/OWNERS)~~ [christopherhein] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment