kubernetes-sigs / controller-runtime

Repo for the controller-runtime subproject of kubebuilder (sig-apimachinery)
Apache License 2.0
2.37k stars 1.11k forks source link

Dynamic RESTMapper fails to load new Versions within a pre-existing Group on cache miss #2869

Open griffindvs opened 4 days ago

griffindvs commented 4 days ago

I have a controller that installs other CRDs and controllers (via Operator bundles). After updating sigs.k8s.io/controller-runtime from v0.12.2 to v0.18.4, our controller is now hitting errors trying to create a resource of a given GVK: failed to get restmapping: w.example.com. The CRD in question does exist, and if the controller pod is restarted, the problem disappears. The CRD in question is installed during runtime of the controller, so it does not exist prior to initialization.

This occurs even in the follow situation:

It does not occur if:

I believe this is due to an error in the following chain:

We see the error message from: https://github.com/kubernetes-sigs/controller-runtime/blob/700befecdffa803d19830a6a43adc5779ed01e26/pkg/client/apiutil/apimachinery.go#L74-L78

(where we check if a GVK is namespaced, this one is). We can trace this to the restmapper.RESTMapping call:

func (m *mapper) RESTMapping(gk schema.GroupKind, versions ...string) (*meta.RESTMapping, error) {
    res, err := m.getMapper().RESTMapping(gk, versions...)
    if meta.IsNoMatchError(err) {
        if err := m.addKnownGroupAndReload(gk.Group, versions...); err != nil {
            return nil, err
        }
        res, err = m.getMapper().RESTMapping(gk, versions...)
    }

    return res, err
}

https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/client/apiutil/restmapper.go#L117-L127

If we assume that we hit meta.IsNoMatchError, addKnownGroupAndReload should be called with the Group but no Versions provided (since the original IsGVKNamespaced call included no Version).

If we continue to trace this to addKnownGroupAndReload:

func (m *mapper) addKnownGroupAndReload(groupName string, versions ...string) error {
  // versions will here be [""] if the forwarded Version value of
  // GroupVersionResource (in calling method) was not specified.
  if len(versions) == 1 && versions[0] == "" {
  versions = nil
  }

  // If no specific versions are set by user, we will scan all available ones for the API group.
  // This operation requires 2 requests: /api and /apis, but only once. For all subsequent calls
  // this data will be taken from cache.
  if len(versions) == 0 {
  apiGroup, err := m.findAPIGroupByName(groupName)
  if err != nil {
      return err
  }
  if apiGroup != nil {
      for _, version := range apiGroup.Versions {
          versions = append(versions, version.Version)
      }
  }
}

https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/client/apiutil/restmapper.go#L155-L175

We will enter the branch for if len(versions) == 0, where we get the apiGroup using findAPIGroupByName:

// findAPIGroupByNameLocked returns API group by its name.
func (m *mapper) findAPIGroupByName(groupName string) (*metav1.APIGroup, error) {
  // Looking in the cache first.
  {
  m.mu.RLock()
  group, ok := m.apiGroups[groupName]
  m.mu.RUnlock()
  if ok {
      return group, nil
  }
}

https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/client/apiutil/restmapper.go#L237

This will find the example.com Group in the cache, but the cached version will not include the newly installed Version. It will only include Version v1 from the CRD that was present prior to starting this controller. A new CRD with Version v2 was added later after the initialization.

I believe this was introduced in v0.15.0 when there was a switch to use the lazy RESTMapper: https://github.com/kubernetes-sigs/controller-runtime/commit/935faeba70039b5403616e73f109f4b6b1115b9f

alvaroaleman commented 3 days ago

/kind bug Please feel free to submit a fix

griffindvs commented 17 hours ago

/assign @griffindvs