Problem: I have a simple setup with a container source connected to a GCP pullsub backed broker itself forwarding message to a simple sequence composed of a single service.
Everything works at first: an event sent from the source is handled correctly by the sequence/service. But after some times (typical scenario is a few days such as a weekend without any traffic), events fail to be sent by the containersource. Everytime an event is failed to be sent, we can see a failure in the knative-serving activator logs about throttling errors. Recreating activator pods make it work again usually. Only GCP brokers (or channels) seem to be affected:
Here is a gist to help setup the failing case: https://gist.github.com/ratamovic/44f3aee5b6e91882071b6d2822686e72 (please change what's appropriate in it). It creates a container source (a python script), two brokers (in-mem and GCP pullsub), a sequence composed of a single pod (a simple flask service).
You can run it by:
adapting what's necessary (project name, gcp key, serviceaccount if necessary)
running ./failure-setup.sh create_cluster to create a new cluster "prone to failure" (but maybe any cluster might have the same issue).
running ./failure-setup.sh create_docker to create docker images for the container source and the sequence service
running ./failure-setup.sh setup_my_namespace to setup the namespace with source, sequence, services and brokers
running ./failure-setup.sh send_event_mem to send an event through in-memory broker (which works) as many times as you want (this is a reference that shows that the whole system works well with in-memory broker)
running ./failure-setup.sh send_event_gcp to send an event through GCP pull sub back broker (which works but then fails after a few days) as many times as you want.
In fact not only GCP seems affected on my setup, though it happens more often and quickly with it usually... I'm trying to open an issue on knative serving.
Problem: I have a simple setup with a container source connected to a GCP pullsub backed broker itself forwarding message to a simple sequence composed of a single service.
Everything works at first: an event sent from the source is handled correctly by the sequence/service. But after some times (typical scenario is a few days such as a weekend without any traffic), events fail to be sent by the containersource. Everytime an event is failed to be sent, we can see a failure in the knative-serving activator logs about throttling errors. Recreating activator pods make it work again usually. Only GCP brokers (or channels) seem to be affected:
Here is a gist to help setup the failing case: https://gist.github.com/ratamovic/44f3aee5b6e91882071b6d2822686e72 (please change what's appropriate in it). It creates a container source (a python script), two brokers (in-mem and GCP pullsub), a sequence composed of a single pod (a simple flask service). You can run it by:
KNative serving version is 0.10.0, eventing is 0.10.1 and Knative-GCP is 0.10.1 (previous versions such as 0.9 are affected too). This issue has been a bit discussed on KNative Slack (https://knative.slack.com/archives/C9JP909F0/p1572864600066700)