Closed aszecowka closed 5 years ago
It seems the brokerClient is created every call to the service broker, see controller_clusterservicebroker. The function is defined here. As I see in the body, the function creates new client every call. It uses osb.NewClient, which creates golang http client every time. This is the cause of the problem. The http.Client should shared.
The solution could be: create a map, which holds osbClient instance for every service broker. Instead of creating an osbClient every call, just take it from the map. Thanks to that we can have separate configuration (Basic Auth or TLS) for service brokers and no need to create a new client every time. Such changes must be applied also to all other operations like provisioning instance, bind request etc.
I've created a POC service catalog changes. It is a quick fix and must not be merged. We have run it and the memory usage is not growing. No restart. See:
https://github.com/piotrmiskiewicz/service-catalog/tree/shared-http-client-for-getcatalog
How the quick fix is working:
This branch must not be merged because it does not handle multiple TLS configs and metrics won't work.
Sounds to me like even when running without error there is a slow but constant memory leak, but with the broker error we have a tight retry loop where we are constantly calling getCatalog() and exacerbating the leak, do you agree? And that over enough time, even a non-erroring deployment is going to grow in memory use and eventually exceed memory allotment. I'll give this some further review.
Looks to me like there are several issues we should fix here. Perhaps more then this:
1) review interaction with HTTP Client and address memory leak. It looks to me like we don't ever close the broker response in all places, could be an issue. Perhaps we need to reuse the connections as you've indicated as well. 2) our retry loop on the getCatalog() is too tight, we constantly pound the broker with retries. There should be a better backoff.
The official http doc contains the sentence: "The Client's Transport typically has internal state (cached TCP connections), so Clients should be reused instead of created as needed. Clients are safe for concurrent use by multiple goroutines." If you create a http client one per minute GC should do the work and the application could work for weeks, but if you create a client every second - it is a problem. I'm looking at the http response processing and I do not see response.Body.Close() call as it is described in the official example
I agree about lack of Body.close(). Last week I spent some time cleaning that up and testing, it didn't remedy the OOM issue but I agree the body should always be closed. I created https://github.com/pmorie/go-open-service-broker-client/pull/131 for it.
Bug Report
What happened: Controller Manager pod is constantly restarting when I registered a malfunctioning broker (broker responds with status code HTTP 500 on "/v2/catalog" endpoint). Memory usage for ControllerManager goes up to 30MB and then it is restarted - probably because of defined memory limit. To magnify this problem, I created ~80 malfunctioning brokers, and in 35 minutes ControllerManager was restared 8 times.
What you expected to happen: Controller Manager does not restart
How to reproduce it (as minimally and precisely as possible): Install Service Catalog on minikube:
minikube start --extra-config=apiserver.authorization-mode=RBAC
helm init
helm repo add svc-cat https://svc-catalog-charts.storage.googleapis.com helm install svc-cat/catalog \ --name catalog --namespace catalog
Write a malfunctioning broker:
Create Dockerfile
Build broker image:
eval (minikube docker-env)
GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -o servicebroker-linux --ldflags="-s" .
docker build -t "nasty-app:1.0.0" .
Create deployment:
kubectl create -f deploy.yaml
Create service:
kubectl create -f service.yaml
Create ClusterServiceBroker:
kubectl create -f charts/broker.yaml
Broker status:
kubectl describe clusterservicebroker nasty-broker
Here you can see that pod is restarting:
kubectl get pods -n catalog
Anything else we need to know?:
Environment:
kubectl version
):kubectl get cm -n kube-system extension-apiserver-authentication -o yaml
and look forrequestheader-XXX
fields looks fine--set
any extra values? see above