Queue-worker fail to retry function invocations

According to OpenFaaS official doc, when the function concurrent request limit is exceeded (function's max_inflights env variable), the function returns a 429 status code, and the queue worker rather than dropping the message simply submits it back to the queue. But in my experiments, the failed requests will not be retried for processing.

By the way, I have checked https://www.openfaas.com/blog/limits-and-backpressure/ and https://docs.openfaas.com/reference/async/.

Expected Behaviour

I edited deployment for queue-worker and set up env max_inflights:

spec:
      containers:
      - env:
        - name: faas_nats_address
          value: nats.openfaas.svc.cluster.local
        - name: faas_nats_channel
          value: faas-request
        - name: faas_nats_queue_group
          value: faas
        - name: faas_gateway_address
          value: gateway.openfaas.svc.cluster.local
        - name: faas_function_suffix
          value: .openfaas-fn.svc.cluster.local
        - name: ack_wait
          value: 60s
        - name: max_inflight
          value: "100"
        - name: max_retry_attempts
          value: "10"
        - name: max_retry_wait
          value: 120s
        - name: initial_retry_wait
          value: 10s
        - name: retry_http_codes
          value: 408,429,500,502,503,504
        - name: print_request_body
          value: "false"
        - name: print_response_body
          value: "false"
        - name: secret_mount_path
          value: /var/secrets/gateway
        - name: basic_auth
          value: "true"

I set concurrent request limit for a function in a yaml like this:

version: 1.0
provider:
  name: openfaas
  gateway: http://192.168.122.11:31112
functions:
  test-intra-parallelism:
    lang: python3-flask
    handler: ./test-intra-parallelism
    image: 192.168.122.11:5000/test-intra-parallelism:latest
    environment:
      max_inflight: 10

When bursty async invocations arrived, the function will try its best to handle requests at concurrency under the max_inflights, and all the requests cached in queue-worker will be processed later on.

Current Behaviour

I deploy a simple function that sleeps for 2 seconds at first and then writes a timestamp in Redis DB. I generate workload with hey:

hey -c 50 -m POST \
 -z 1s -q 1 \
 -H "X-Callback-Url: http://192.168.122.1:8000" \
 $OPENFAAS_URL/async-function/test-intra-parallelism

the result was:

Summary:
  Total:        1.0243 secs
  Slowest:      0.0153 secs
  Fastest:      0.0108 secs
  Average:      0.0132 secs
  Requests/sec: 48.8128

Response time histogram:
  0.011 [1]     |■■
  0.011 [1]     |■■
  0.012 [2]     |■■■■■
  0.012 [6]     |■■■■■■■■■■■■■■
  0.013 [3]     |■■■■■■■
  0.013 [6]     |■■■■■■■■■■■■■■
  0.014 [5]     |■■■■■■■■■■■■
  0.014 [17]    |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.014 [7]     |■■■■■■■■■■■■■■■■
  0.015 [1]     |■■
  0.015 [1]     |■■

Latency distribution:
  10% in 0.0118 secs
  25% in 0.0127 secs
  50% in 0.0136 secs
  75% in 0.0139 secs
  90% in 0.0141 secs
  95% in 0.0146 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0044 secs, 0.0108 secs, 0.0153 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0000 secs
  req write:    0.0001 secs, 0.0000 secs, 0.0003 secs
  resp wait:    0.0085 secs, 0.0053 secs, 0.0123 secs
  resp read:    0.0000 secs, 0.0000 secs, 0.0002 secs

Status code distribution:
  [202] 50 responses

but only 10 timestamps were recorded, which means the OpenFaaS only handle 10 requests.

OpenFaaS needs to ensure that all sent asynchronous calls are processed, not just ignored some of them.

Are you a GitHub Sponsor (Yes/No?)

Check at: https://github.com/sponsors/openfaas

[ ] Yes
[x] No

Steps to Reproduce (for bugs)

Prepare an OpenFaaS cluster in a kubernetes cluster.
Deploy a function that simply write timestamp into Redis. Use python-flask template with the use of-watchdog.
Add function's env with max_inflights=10.
Edit queue-worker deployment and set up env max_inflights=100 (a large number to avoid bottlenecks).

Context

Before we decide to try the paid service of OpenFaaS Pro, we need to ensure the stability of the service. The limit of max_inflights is critical because too large an intra-parallelism can crash the container.

I don't think it's a design flaw, it's possible that the documentation wasn't clear enough to make my configuration wrong.

Your Environment

FaaS-CLI version ( Full output from: faas-cli version ): 0.14.6
Docker version docker version (e.g. Docker 17.0.05 ): 20.10.17
What version and distriubtion of Kubernetes are you using? kubectl version v1.22.0
Operating System and version (e.g. Linux, Windows, MacOS): Linux
Link to your project or a code example to reproduce issue:
What network driver are you using and what CIDR? i.e. Weave net / Flannel Flannel

openfaas / faas-netes