Closed dianabarsan closed 1 week ago
I had to re-run this job 6 times today: https://github.com/medic/cht-core/actions/runs/11717271778
cc @m5r
More sample runs (from other branches) where this test fails: https://github.com/medic/cht-core/actions/runs/11720994250/job/32650105429?pr=9611 https://github.com/medic/cht-core/actions/runs/11720993712/job/32650884227?pr=9611
It looks like this is blocking PRs from getting merged. I'm suggesting we disable this test and prioritize stabilizing it before re-enabling.
I've reran this test at least 10 times and it's always failed. I will disable it pending a fix.
I can't get to reproduce the issue locally for now. I tried throttling chrome with wdio's browser.throttleCPU(8)
, I tried actually throttling my CPU by making couchdb re-index views and have it use >90% of my CPU.
Are you running the whole e2e suite when trying to reproduce?
No I tried running:
describe('search matches telemetry', ...)
telemetry.wdio-spec.js
filetelemetry.wdio-spec.js
file + one or two additional test filesI'll run the whole suite and see if it triggers the bug more consistently than I have seen so far
That doesn't seem to influence the flakiness of the telemetry tests. I've had these 3 target accuracy tests failing repeatedly across most runs locally but they seem to run fine in the CI
[chrome 130.0.6723.116 linux #0-75] » /tests/e2e/default/targets/target-accuracy.wdio-spec.js
[chrome 130.0.6723.116 linux #0-75] Target accuracy
[chrome 130.0.6723.116 linux #0-75] ✓ should save target document on first calculation
[chrome 130.0.6723.116 linux #0-75] ✓ should save target document when targets change
[chrome 130.0.6723.116 linux #0-75] ✓ should not save target document when editing counted contact
[chrome 130.0.6723.116 linux #0-75] ✓ should not save target document when adding report for counted contact
[chrome 130.0.6723.116 linux #0-75] ✖ should save target document when deleting counted contact (5 retries)
[chrome 130.0.6723.116 linux #0-75] ✖ should save target doc once when getting many changes through replication (5 retries)
[chrome 130.0.6723.116 linux #0-75] ✓ should only create one target doc
[chrome 130.0.6723.116 linux #0-75] ✖ should handle old format of the rules-state-store (5 retries)
These two failed once
[chrome 130.0.6723.116 linux #0-28] » /tests/e2e/default/enketo/pregnancy-complete-a-delivery.wdio-spec.js
[chrome 130.0.6723.116 linux #0-28] Contact Delivery Form
[chrome 130.0.6723.116 linux #0-28] ✓ Complete a delivery: Process a delivery with a live child and facility birth, verify that the past pregnancy card is present and the report was created,verify that the chil
d registered during birth is created and displayed the proper information,verify that the targets page is updated
[chrome 130.0.6723.116 linux #0-28] ✓ open, submit and edit (no changes) default delivery form
[chrome 130.0.6723.116 linux #0-28] ✖ open, submit and edit default delivery form (5 retries)
[chrome 130.0.6723.116 linux #0-36] » /tests/e2e/default/enketo/submit-photo-upload-form.wdio-spec.js
[chrome 130.0.6723.116 linux #0-36] Submit Photo Upload form
[chrome 130.0.6723.116 linux #0-36] ✖ "before all" hook for Submit Photo Upload form
[chrome 130.0.6723.116 linux #0-36]
[chrome 130.0.6723.116 linux #0-36] 1 failing (10m 0.1s)
[chrome 130.0.6723.116 linux #0-36]
[chrome 130.0.6723.116 linux #0-36] 1) Submit Photo Upload form "before all" hook for Submit Photo Upload form
[chrome 130.0.6723.116 linux #0-36] Timeout
[chrome 130.0.6723.116 linux #0-36] Error: Timeout
[chrome 130.0.6723.116 linux #0-36] at listOnTimeout (node:internal/timers:581:17)
[chrome 130.0.6723.116 linux #0-36] at processTimers (node:internal/timers:519:7)
But other than that, I couldn't reproduce the telemetry bug. I'll see if I can trigger it in the CI using this branch
Could it have been something on that date?
Now that's interesting, I managed to reproduce it by mocking the Date
object and going back to the same point in time as the CI failures. Thanks for the idea! I'm sorting out a fix
I got around to it and it was a disappointingly dumb bug in how the telemetry docs are fetched in the test. Telemetry databases have the date in their name and the telemetry service doesn't pad the date's digits with leading zeros, meaning for November 6th 2024 it will format the date as 2024-11-6
while the test code formatted the date as 2024-11-06
🤦♂
I don't know if not padding the digits is ISO 8601 compliant but that's another issue. CI is running with the fixed test code
Describe the issue New telemetry tests appear to be flaking: Example run: https://github.com/medic/cht-core/actions/runs/11702688507/job/32592094267