skupperproject / skupper

Skupper is an implementation of a Virtual Application Network, enabling rich hybrid cloud communication.
http://skupper.io
Apache License 2.0
579 stars 70 forks source link

"Secret already exists" when binding a service right after its creation #1208

Open hash-d opened 1 year ago

hash-d commented 1 year ago

While testing a fix for #1153, my local tests failed intermittently with a different error:

http.go:908: assertion failed: error is not nil: secrets "skupper-tls-nghttp1tls" already exists

or

http.go:885: assertion failed: error is not nil: secrets "skupper-tls-nghttp2tls" already exists

Looking at the code, two operations are referencing the types.ServiceInterface that defines the offending secret name, one right after the other:

904         err = prv1Cluster.VanClient.ServiceInterfaceCreate(ctx, &http1TlsService)
905         assert.Assert(t, err)
906
907         err = prv1Cluster.VanClient.ServiceInterfaceBind(ctx, &http1TlsService, "deployment", "nghttp1tls", map[int]int{}, "")
908         assert.Assert(t, err)
909                                                                                                                                                                                                                                          

The first just creates or updates a ConfigMap (and lets some other module detect that creation/change and react?), while the second, which's the one that fails, checks for the existence of a secret before creating it.

Between each of these steps, however, there are a few other network-bound operations (such as other Get operations against the K8S cluster).

My guess is that an asynchronous process saw the new ConfigMap from the first step and created the secret, between check and creation of secret on the second step.

This would be a test code issue, however I was able to reproduce it using the cli (see below)

How to reproduce

I've run the following loop on a CRC 4.12 running locally. That part is key, as the issue seems to depend on how fast one process or the other takes to complete.

#!/usr/bin/sh -xv
#
# Reproducer for error
#
#   http.go:885: assertion failed: error is not nil: secrets "skupper-tls-nghttp2tls" already exists
#
# on testHttps

skupper init
oc create deploy hello --image quay.io/skupper/hello-world-backend

i=0

while true
do
  i=$(( i+1 ))
  echo Try $i
  skupper service create hello 8080 --generate-tls-secrets
  skupper service bind hello deployment hello || { exit 1; }
  skupper service delete hello
done

On my machine, the error was reproduced after 74 iterations:

(...)
+ true
+ i=74
+ echo Try 74
Try 74
+ skupper service create hello 8080 --generate-tls-secrets
+ skupper service bind hello deployment hello
Secret already exists:  skupper-tls-hello
Error: secrets "skupper-tls-hello" already exists
+ exit 1

Expected behavior

The error is fairly rare and happens (only?) on a specific type of environment (fast, dedicated). Yet, the execution of several skupper commands in batch could be expected in user's scripts and pipelines.

Whatever fix is done for the cli could probably be applied (or already apply) for the test code. Perhaps recheck the service existence closer to its creation, or treating this specific failure more gracefully.

hash-d commented 1 year ago

I've left the same loop running 650+ times, but without --generate-tls-secrets, with no failures.

(as expected, given the name of the offending secret)

hash-d commented 1 year ago

@ajssmith, @nluaces FYI

nluaces commented 1 year ago

Thanks @hash-d , I will look into it.