Open aiyijing opened 2 years ago
Hi @aiyijing,
Looking at this issue more in-depth it's not immediately clear what the problem is. The interesting log here is
sync "operators" failed: etcdserver: request is too large
it's an unusual error. When you encounter this issue, could you also provide the api-server logs on the cluster? That should help us look into this issue more.
Also, for reproducibility, if you could provide the manifests (subscription, catalogsources) that you used to generate the installplan that would also be helpful.
@exdx Unfortunately, the log no longer exists。
But I remember that I checked the audit log about apiserver. The catalog-operator update installplan.status caused the apiserver/etcd 500 error.
As you said, maybe installplan.status is too large?
I found some instructions about etcd https://etcd.io/docs/v3.3/dev-guide/limit/
So,Does installplan need to limit the number of CSV installations or updates? 😄😄😄
There is very confused why single installplan need to install multiple csv and dependent resource of csv. Maybe it need to improve here because of etcd requests limit
@aiyijing This is because OLM install / update resolution logic works at the namespace level and therefore considers all operators currently installed in the namespace via Subscription
. The solution is to install these operators in other namespaces.
@dmesser
This seems to be another solution:
Installplan be split according to Resolving
of step, and then create and update the installplan.
Otherwise we have to modify etcd request limit,obviously it is not reasonable.
There are more restrictions:
I extend etcd --max-request-bytes to 10Mi but it is sadly that grpc server has restrictions of grpc send message size limit.
If I install a lot of operator in global NS and at upgrade the index image, all operators will fail to upgrade because installplan cannot be updated successfully.
time="2022-03-29T17:16:17Z" level=warning msg="no installplan found with matching generation, creating new one" id=RpdmR namespace=operators
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing id=wdOG+ ip=install-fnm47 namespace=operators phase=
time="2022-03-29T17:16:17Z" level=info msg="skip processing installplan without status - subscription sync responsible for initial status" id=wdOG+ ip=install-fnm47 namespace=operators phase=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-03-29T17:16:17Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
E0329 17:16:18.184570 1 queueinformer_operator.go:290] sync "operators" failed: rpc error: code = ResourceExhausted desc = trying to send message larger than max (2627600 vs. 2097152)
Bug Report
What did you do? I created multiple subscriptions in a short time, then these subscription status is InstallPlanPending, Like the following:
I can see the installplan status is nil
At the same time,the catalog-operator logs:
Environment
Possible Solution
I guess catalog-operator needs to retry here https://github.com/operator-framework/operator-lifecycle-manager/blob/2d649b0d5935ecfecfefbe56f266cc7b04d0b290/pkg/controller/operators/catalog/operator.go#L1255-L1260