openshift-pipelines / pipelines-as-code

Pipelines-as-Code for Tekton
https://pipelinesascode.com
Apache License 2.0
137 stars 81 forks source link

Continue when failing the concurrency #1810

Closed chmouel closed 2 weeks ago

chmouel commented 2 weeks ago

Continue when failing the concurrency

When we were failing concurrency we were not continuing to the next one and the semaphore will not be removing the lock which would end up with a deadlock.

This only happens when the underlying cluster is becoming really busy and have issues to update the pipelinerun in progress.

The only way to test this is to modify updatePipelineRunToInProgress.

I have a repo with three pipelinerun test-1, test-2 and test-3 matching a pull-request. I have a concurrency_limit of 1 in the repo spec.

I run the Pr and the test-1 should run successfully, test-2 should return an error but here test-3 should get kicked and start.

Here is the patch i have to do to test this, add this to reconciler/reconciler.go:

func randomError(prn string) error {
  if strings.HasPrefix(prn, "test-2") {
   return fmt.Errorf("DEBUG: ๐Ÿ˜ˆ randomly failing this PipelineRun: %s", prn)
  }
  /* if we want to generate more randomness on load testing
  var b [8]byte
  if _, err := rand.Read(b[:]); err != nil {
    return fmt.Errorf("failed to generate random number: %w", err)
  }
  randomNumber := binary.LittleEndian.Uint64(b[:]) % 100

  if randomNumber < 50 {
   return fmt.Errorf("DEBUG: ๐Ÿ˜ˆ randomly failing this PipelineRun: %s", prn)
  }

  */
  return nil
}

and add at the top of the updatePipelineRunToInProgress function:

if err := randomError(pr.GetName()); err != nil {
 return err
}

Jira: https://issues.redhat.com/browse/SRVKP-6740

Demo of the PR

IMAGE ALT TEXT

Changes

Submitter Checklist

codecov[bot] commented 2 weeks ago

Codecov Report

Attention: Patch coverage is 60.19417% with 41 lines in your changes missing coverage. Please review.

Project coverage is 65.34%. Comparing base (3723cde) to head (5dfecd1). Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
pkg/test/concurrency/concurrency.go 26.31% 14 Missing :warning:
pkg/reconciler/queue_pipelineruns.go 73.33% 10 Missing and 2 partials :warning:
pkg/sync/queue_manager.go 65.21% 6 Missing and 2 partials :warning:
pkg/reconciler/reconciler.go 40.00% 6 Missing :warning:
pkg/reconciler/finalizer.go 50.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1810 +/- ## ========================================== + Coverage 65.17% 65.34% +0.17% ========================================== Files 175 177 +2 Lines 13529 13587 +58 ========================================== + Hits 8817 8878 +61 + Misses 4125 4117 -8 - Partials 587 592 +5 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

chmouel commented 2 weeks ago

/test go-testing

chmouel commented 2 weeks ago

There was some defect that was exposed with the E2E test TestGiteaConcurrencyExclusivenessMultipleRuns, it is fixed in this force push:

https://github.com/openshift-pipelines/pipelines-as-code/compare/d516c30e82b6ea2e1fb9ef677b30c8d3f1648582..5dfecd19dfe03c0c9b36c975fdf9a024de3010d2

savitaashture commented 2 weeks ago

Tested manually with the given steps and with concurrency_limit as 1, 2 Screenshot from 2024-11-08 18-05-09

Working as explained

Thank you :+1:

Please merge when you are done with stress testing