hyperledger-bevel / bevel-operator-fabric

Hyperledger Fabric Kubernetes operator - Hyperledger Fabric operator for Kubernetes (v2.3, v2.4 and v2.5, soon 3.0)
https://hyperledger-bevel.github.io/bevel-operator-fabric/
Apache License 2.0
272 stars 92 forks source link

memory limit of operator manager too low on GKE #198

Closed koh-osug closed 11 months ago

koh-osug commented 12 months ago

What happened?

When deploying peers the manager is killed in GKE with an OOM. It is not restarted successfully. GKE in auto pilot mode is only considering the resource limit of the deployment. Here the 128Mi are not sufficient.

I had to install the operator with:

helm install --set resources.requests.cpu=500m --set resources.requests.memory=500Mi ...

But maybe the defaults could be increased?

What did you expect to happen?

The manager survives and does not crash

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

The value used in https://github.com/hyperledger/bevel-operator-fabric/blob/8aa3a7741499c126b9004d9dc5a8e2f54a44333c/chart/hlf-operator/templates/deployment.yaml#L64 is 128Mi and in the deployment the limit also seems to be too low. I have set it to 1Gi and the manager is not crashing anymore.

Kubernetes version

```console NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME gk3-hlf-default-pool-xxx Ready 13h v1.27.3-gke.100 10.128.0.54 XXX Container-Optimized OS from Google 5.15.109+ containerd://1.7.0 gk3-hlf-pool-1-xxx Ready 20h v1.27.3-gke.100 10.128.0.51 XXX Container-Optimized OS from Google 5.15.109+ containerd://1.7.0 gk3-hlf-pool-1-xxx Ready 22h v1.27.3-gke.100 10.128.0.50 XXX Container-Optimized OS from Google 5.15.109+ containerd://1.7.0 gk3-hlf-pool-1-xxx Ready 84m v1.27.3-gke.100 10.128.0.59 XXX Container-Optimized OS from Google 5.15.109+ containerd://1.7.0 gk3-hlf-pool-1-xxx Ready 13h v1.27.3-gke.100 10.128.0.53 XXX Container-Optimized OS from Google 5.15.109+ containerd://1.7.0 ```
adityajoshi12 commented 11 months ago

Hi @koh-osug, we have not seen an OOM issue in other clouds, but there is always an option available to override the limits