karmada-io / karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
https://karmada.io
Apache License 2.0
4.45k stars 883 forks source link

performance improvement for karmada-controller-manager to use `DeferredDiscoveryRESTMapper` #2105

Closed likakuli closed 2 years ago

likakuli commented 2 years ago

What would you like to be added:

use DeferredDiscoveryRESTMapper instead of DynamicRESTMapper to improve karmada-controller-manager performance

Why is this needed: https://github.com/karmada-io/karmada/blob/c1b5aef905f39e56fd48120c9ff4fcc8e0cda637/pkg/detector/detector.go#L772-L773 https://github.com/karmada-io/karmada/blob/c1b5aef905f39e56fd48120c9ff4fcc8e0cda637/pkg/detector/detector.go#L591-L593

DynamicRESTMapper will visit cluster every time to get specific gvr by gvk. it's a waste of performance when there are many object but with less gvk.

likakuli commented 2 years ago

/assign

RainbowMango commented 2 years ago

Thanks for pointing it out. In addition to DynamicRESTMapper and DeferredDiscoveryRESTMapper, I just noticed there also other alternatives:

I guess you have evaluated them and I wonder why you finally chose DeferredDiscoveryRESTMapper .

likakuli commented 2 years ago

There is a similar scene in kube-controller-manager's GC logic where DeferredDiscoveryRESTMapper used. And what we need is just a RestMapper with cache func.

https://github.com/kubernetes/kubernetes/blob/89aaf7c02dd33415ac994cecfddca7c02ed75834/cmd/kube-controller-manager/app/controllermanager.go#L507-L513

By the way, there is a performance problem in the kube-controller-manager‘s GC controller and i have commit a pr to fix it. I have checked the similar logic in karmada-controller-manager and there is no problem in karmada.

Refer to https://github.com/kubernetes/kubernetes/pull/110888

RainbowMango commented 2 years ago

DynamicRESTMapper will visit cluster every time to get specific gvr by gvk. it's a waste of performance when there are many object but with less gvk.

I just looked at the implementation of DynamicRESTMapper, seems it will get the cached result from the static restmapper, if the requested data not in the cache, then reload the cache and try again.

Can you help confirm that? I need to clarify that this doesn't mean the objection, just a confirmation.

By the way, how do you observe this issue? I don't know how to test it and evaluate the effect of the improvement made by #2106.

likakuli commented 2 years ago

p90perf This is the svg generated by golang pprof and we can see the cpu cost by each function call. There are two places need to improve.

The left shows that list by labelselector cost too much cpu time because of controller-runtime not use index when list by labelselector. I will open a new issue to resolve this problem.

The right shows that restmappper.GetGroupVersionResource cost too much cpu time and the reason is that for every waiting object it will call remote apiserver to get the gvr info even if there are only one gvk but with a lot of waiting objects. it's quite a waste of time and cpu resource. What we need is just to call remote apiserver once for each gvk and use cached result for other waiting objects with same gvk.

RainbowMango commented 2 years ago

image

The CPU time is mostly occupied by runtime.mapaccess2 and runtime.newobject, does that means it is in a remote HTTP call? I'm not sure about it. From the implementation of DynamicRESTMapper, seems it will cache all the resource groups in memory(named static restmapper).

Anyway, do you have the profiling data after #2106? I wonder how much performance has improved by this PR.

likakuli commented 2 years ago

image

we monitor the workqueue_work_duration_seconds_bucket of karmada-controller-manager for each queue. It means the processing time of every work queue item. P90 decreased from 9s to 910ms.

likakuli commented 2 years ago

I closed the previous pr #2106 for this reason https://github.com/karmada-io/karmada/pull/2106#issuecomment-1180189827. I can create a new pr If the first method is ok.

RainbowMango commented 2 years ago

I can create a new pr If the first method is ok.

Can you explain why the first method works better than the current one? I mean by theoretically.

likakuli commented 2 years ago

benchmark:

// restmapper.go
package restmapper

import (
    "sync"

    "k8s.io/apimachinery/pkg/api/meta"
    "k8s.io/apimachinery/pkg/runtime/schema"
)

var gvkToGVRMap sync.Map

// GetGroupVersionResource is a helper to map GVK(schema.GroupVersionKind) to GVR(schema.GroupVersionResource).
func GetGroupVersionResource(restMapper meta.RESTMapper, gvk schema.GroupVersionKind) (schema.GroupVersionResource, error) {
    value, ok := gvkToGVRMap.Load(gvk)
    if !ok {
        restMapping, err := restMapper.RESTMapping(gvk.GroupKind(), gvk.Version)
        if err != nil {
            return schema.GroupVersionResource{}, err
        }

        gvkToGVRMap.Store(gvk, restMapping.Resource)
        value = restMapping.Resource
    }

    return value.(schema.GroupVersionResource), nil
}

func GetGroupVersionResource2(restMapper meta.RESTMapper, gvk schema.GroupVersionKind) (schema.GroupVersionResource, error) {
    restMapping, err := restMapper.RESTMapping(gvk.GroupKind(), gvk.Version)
    if err != nil {
        return schema.GroupVersionResource{}, err
    }

    return restMapping.Resource, nil
}

// restmapper_test.go
package restmapper

import (
    "testing"
    "time"

    appsv1 "k8s.io/api/apps/v1"
    "k8s.io/apimachinery/pkg/util/wait"
    "k8s.io/client-go/discovery"
    cacheddiscovery "k8s.io/client-go/discovery/cached/memory"
    "k8s.io/client-go/restmapper"
    "k8s.io/client-go/tools/clientcmd"
    "sigs.k8s.io/controller-runtime/pkg/client/apiutil"
)

func BenchmarkGetGroupVersionResource(b *testing.B) {
    config, err := clientcmd.BuildConfigFromFlags("", "./kubeconfig")
    if err != nil {
        panic(err)
    }
    restMapper, err := apiutil.NewDynamicRESTMapper(config)
    if err != nil {
        panic(err)
    }

    for i := 0; i < b.N; i++ {
        GetGroupVersionResource(restMapper, appsv1.SchemeGroupVersion.WithKind("Deployment"))
    }
}

func BenchmarkGetGroupVersionResource2(b *testing.B) {
    config, err := clientcmd.BuildConfigFromFlags("", "./kubeconfig")
    if err != nil {
        panic(err)
    }
    restMapper, err := apiutil.NewDynamicRESTMapper(config)
    if err != nil {
        panic(err)
    }

    for i := 0; i < b.N; i++ {
        GetGroupVersionResource2(restMapper, appsv1.SchemeGroupVersion.WithKind("Deployment"))
    }
}

func BenchmarkGetGroupVersionResource3(b *testing.B) {
    config, err := clientcmd.BuildConfigFromFlags("", "./kubeconfig")
    if err != nil {
        panic(err)
    }

    discoveryClient := discovery.NewDiscoveryClientForConfigOrDie(config)
    cachedClient := cacheddiscovery.NewMemCacheClient(discoveryClient)
    restMapper := restmapper.NewDeferredDiscoveryRESTMapper(cachedClient)
    go wait.Until(func() {
        restMapper.Reset()
    }, 300*time.Second, make(chan struct{}))

    GetGroupVersionResource2(restMapper, appsv1.SchemeGroupVersion.WithKind("Deployment"))
    for i := 0; i < b.N; i++ {
        GetGroupVersionResource2(restMapper, appsv1.SchemeGroupVersion.WithKind("Deployment"))
    }
}

BenchmarkGetGroupVersionResource1: use GetGroupVersionResource with sync.Map and DynamicRestMapper BenchmarkGetGroupVersionResource2: use GetGroupVersionResource2 with DynamicRestMapper BenchmarkGetGroupVersionResource3: use GetGroupVersionResource2 with DeferredDiscoveryRESTMapper and MemCacheClient

Via CPU Profiling, I see that there are some newobject, makeslice, growslice func call in GetGroupVersionResource2 but not exist in GetGroupVersionResource.

In this case we only write a few times and read more times from gvk to gvr cache, so cache the result directly and return fast will improve the read performance obviously。

RainbowMango commented 2 years ago

config, err := clientcmd.BuildConfigFromFlags("", "./kubeconfig")

You are using a real kubeconfig file, right? (before running the benchmark, you need to prepare the kubeconfig in the current path)

likakuli commented 2 years ago

Yes, put a real kubeconfig file in same folder

RainbowMango commented 2 years ago

Yes, put a real kubeconfig file in same folder

Personality, I don't think a benchmark should rely on a real system. People may get different results when they run against different systems.

I can create a new pr If the first method is ok.

Thanks in advance, but we need to provide more evidence for that. Actually, I'm looking into it now.

RainbowMango commented 2 years ago

I investigate the implementation of DynamicRESTMapper, did find some performance issue:

DynamicRESTMapper uses NewDiscoveryRESTMapper as its cache, when building the cache, NewDiscoveryRESTMapper uses too many versionMapper. the versionMappers are organized into a slice(here), when handling the request, the NewDiscoveryRESTMapper would iterate the slice(see here). I think that's the performance issue here.

likakuli commented 2 years ago

the NewDiscoveryRESTMapper would iterate the slice

Wow, this is one of the main cost indeed.

Personality, I don't think a benchmark should rely on a real system. People may get different results when they run against different systems.

I make a test with fake data without remote request. When there are only one api resource with one gvk exist, the first one even 10 times faster than the second one.

fake data:

groupResources := []*restmapper.APIGroupResources{
                &restmapper.APIGroupResources{
                    Group: metav1.APIGroup{
                        Name: "apps",
                        Versions: []metav1.GroupVersionForDiscovery{
                            metav1.GroupVersionForDiscovery{
                                GroupVersion: "apps/v1",
                                Version:      "v1",
                            },
                        },
                        PreferredVersion: metav1.GroupVersionForDiscovery{
                            GroupVersion: "apps/v1",
                            Version:      "v1",
                        },
                    },
                    VersionedResources: map[string][]metav1.APIResource{
                        "v1":[]metav1.APIResource{
                            metav1.APIResource{
                                Name: "deployments",
                                Namespaced: true,
                                Kind: "Deployment",
                                ShortNames: []string{"deploy"},
                                Categories: []string{"all"},
                                StorageVersionHash: "8aSe+NMegvE=",
                            },
                        },
                    },
                },
            }

For test, I just modified the code in vendor(here) and use the fake data above instead of the real data from remote.

RainbowMango commented 2 years ago

@likakuli Could you please look at the #2187 which introduced a CachedRESTMapper as well as a benchmark testing.

likakuli commented 2 years ago

ok