ocurrent / obuilder

Experimental "docker build" alternative using btrfs/zfs snapshots
Apache License 2.0
60 stars 18 forks source link

flakey CI failures with `Expected: 'Alcotest.reject'` #191

Open shonfeder opened 3 weeks ago

shonfeder commented 3 weeks ago

https://ocaml.ci.dev/github/ocurrent/obuilder/commit/cc1d20bed098578a559b8547799f4747997f3741/variant/freebsd-4.14_opam-2.2

We have a different flaky CI failure (in addition to #186) on the build Concurrent test:

  [FAIL]        build                 2   Concurrent.
  [OK]          build                 3   Concurrent failure.
  [OK]          build                 4   Concurrent failure 2.
  [OK]          build                 5   Cancel.
  [OK]          build                 6   Cancel 2.
  [OK]          build                 7   Cancel 3.
  [OK]          build                 8   Cancel 4.
  [OK]          build                 9   Cancel 5.
  [OK]          build                10   Delete.
  [OK]          secrets               0   Simple.
  [OK]          secrets               1   No secret provided.

┌──────────────────────────────────────────────────────────────────────────────┐
│ [FAIL]        build                 2   Concurrent.                          │
└──────────────────────────────────────────────────────────────────────────────┘
test.exe: [INFO] b1: "(from base)"
test.exe: [INFO] Base image not present; importing "base"…
test.exe: [INFO] Exec "sudo" "--" "mkdir" "-m" "755" "--" "/tmp/build_3e941d_dune/mock-store-057f13/cae662172fd450bb0cd710a769079c05bfc5d8e35efa6576edc7d0377afdd4a2-tmp/rootfs"
exec: [|sudo; --; mkdir; -m; 755; --;
        /tmp/build_3e941d_dune/mock-store-057f13/cae662172fd450bb0cd710a769079c05bfc5d8e35efa6576edc7d0377afdd4a2-tmp/rootfs|]
test.exe: [INFO] Exec "docker" "create" "--" "base"
exec: [|docker; create; --; base|]
test.exe: [INFO] Exec "docker" "export" "--" "base-2"
exec: [|docker; export; --; base-2|]
test.exe: [INFO] Exec "sudo" "--" "tar" "-C" "/tmp/build_3e941d_dune/mock-store-057f13/cae662172fd450bb0cd710a769079c05bfc5d8e35efa6576edc7d0377afdd4a2-tmp/rootfs" "-xf" "-"
exec: [|sudo; --; tar; -C;
        /tmp/build_3e941d_dune/mock-store-057f13/cae662172fd450bb0cd710a769079c05bfc5d8e35efa6576edc7d0377afdd4a2-tmp/rootfs;
        -xf; -|]
test.exe: [INFO] Exec "docker" "rm" "--force" "--" "base-2"
exec: [|docker; rm; --force; --; base-2|]
docker rm --force "base-2"
test.exe: [INFO] Exec "docker" "image" "inspect" "--format" "{{range .Config.Env}}{{print . "\x00"}}{{end}}" "--" "base"
exec: [|docker; image; inspect; --format;
        {{range .Config.Env}}{{print . "\x00"}}{{end}}; --; base|]
test.exe: [INFO] b1: "---> saved as \"cae662172fd450bb0cd710a769079c05bfc5d8e35efa6576edc7d0377afdd4a2\""
test.exe: [INFO] b1: "/: (run (shell A))"
test.exe: [INFO] b1: "A\n"
test.exe: [INFO] b2: "(from base)"
test.exe: [INFO] b2: "---> using \"cae662172fd450bb0cd710a769079c05bfc5d8e35efa6576edc7d0377afdd4a2\" from cache"
test.exe: [INFO] b2: "/: (run (shell A))"
test.exe: [INFO] b2: "A\n"
test.exe: [INFO] b2: "---> saved as \"565f16d718a4939be1275726ddfdce38a7e0fa7d29c762ccf04ed5990a72b79e\""
test.exe: [INFO] b2: "/: (run (shell C))"
test.exe: [INFO] b1: "---> saved as \"565f16d718a4939be1275726ddfdce38a7e0fa7d29c762ccf04ed5990a72b79e\""
test.exe: [INFO] b1: "/: (run (shell B))"
ASSERT timeout 10.000000s
File "src/core/lwt.ml", line 1967, character 26:
FAIL timeout 10.000000s

   Expected: `Alcotest.reject'
   Received: `Alcotest.reject'

Raised at Alcotest_engine__Test.check in file "src/alcotest-engine/test.ml", line 203, characters 4-261
Called from Lwt.Sequential_composition.map.create_result_promise_and_callback_if_deferred.callback in file "src/core/lwt.ml", line 1967, characters 26-31

I'm slightly suspicious that this may be an lwt or alcotest bug... but will need to figure out more to validate.

talex5 commented 3 weeks ago

I think the message is just saying that it timed out. That was added in https://github.com/ocurrent/obuilder/pull/130 ("Windows prereqs tests"). It doesn't seem surprising to me that a test might take more than 10s to run on a busy CI machine.

shonfeder commented 3 weeks ago

Ah, that is helpful, thanks @talex5! Just logged the failure and did not the take the time to examine it carefully to note the timeout issue.

I think we should just be able to increase the timeout (if we cannot reliably get the test to complete quicker).