grafana / oncall

Developer-friendly incident response with brilliant Slack integration
GNU Affero General Public License v3.0
3.46k stars 279 forks source link

race condition when fetching schedule quality, just after creating an oncall-shift, leads to a "Bad" quality schedule ("Schedule is Empty") #1968

Closed joeyorlando closed 1 year ago

joeyorlando commented 1 year ago

there seems to be a race condition regarding the API call we make to fetch a "Schedule Quality". Most of the time, when you create a brand new schedule and add one rotation layer, the schedule quality score will show "good" (expected). However, on rare occasions, the schedule quality score will show "bad" for the same schedule setup. It appears to be related to the response from the backend, my guess is because of some race condition in the ordering of the API calls the UI is making

Matvey-Kuk commented 1 year ago

Can't reproduce :(

joeyorlando commented 1 year ago

to reproduce:

  1. Run the project locally (ie. make init start)
  2. Follow the instructions here to setup the e2e tests locally
  3. cd grafana-plugin && yarn test:integration integration-tests/schedules/quality.test.ts --repeat-each=50

Since (I believe) it's a race condition, it's not easily reproducable via the UI. --repeat-each should reproduce it at least once. Once you're able to reproduce it, playwright will provide you with a report which has a trace/video which should be useful in debugging this.

joeyorlando commented 1 year ago

Here are 3 playwright traces that reproduce this issue: test-results.zip

To open these, first download the file, then:

unzip test-results.zip
npx playwright show-trace test-results/schedules-quality-check-schedule-quality-for-simple-1-user-schedule-chromium/trace.zip
matiasb commented 1 year ago

I think this should have been fixed by the referenced change (reviewing last runs), please re-open if that's not the case.