Closed chmouel closed 3 months ago
https://github.com/openshift-pipelines/pipelines-as-code/pull/1636
this probabaly won't fix the issue, but I couldn't find a reason for the failure. 😞 but updating to latest
we can update the code to not panic and return some error 🤔 I tried multiple test but couldn't reproduce the issue @chmouel
yeah it's a very weird ones, i did try a few runs of stress testing of more than 600 Pruns over a concurrency of 5 and a max-keep-run of 2 and could not reproduce either anymore.
I remember when I was trying to move the cleanup process to the controller the other times i had that same exact errors,
but yeah i think the easiest way is to check if the prun is nil isnt it before sorting?
but yeah i think the easiest way is to check if the prun is nil isnt it before sorting?
yeah will add nil check.
can we do another load test ? 👀
I can try 🙃
will launch 1000 and sees how it goes
I did a lot of tests this morning, multiple runs:
2 repos of 3 pipelineruns with a max-keep-run of 2
one repo is a concurrency of 5 and the other of 10
I ran a 10 loop of 30 /retest on each pipelineruns (using a second controller on ghe for avoiding rate limiting)
so 10(303)=900 * 2 repos == 1800
since we have a concurrency of 5 and 10 we have 15 pipelineruns running all the time, and only 12 left as Succeedeed.
Everything went to completion. The watcher gradually increase its amount of memory, but i think it's the knative library and its informers who does (it's worse on pipeline reconciler) that.
The only issues i had is a name conflict on the secret generation which is 4 letters random a 10^4 so over 1800 there is 14% of possibility this occurs. We may want to increase the secret name length to avoid that...
but other than that so far so good, no crash whatsoever...
little image to illustrate this comment:
i guess we can close it
while doing stress testing over 200 pipelineruns, I saw this crash hapenning after a while (there was only 2 left to process):
cc @sm43