mit-plv / fiat-crypto

Cryptographic Primitive Code Generation by Fiat
http://adam.chlipala.net/papers/FiatCryptoSP19/FiatCryptoSP19.pdf
Other
705 stars 147 forks source link

Github CI Anomaly Zoo #1394

Open andres-erbsen opened 1 year ago

andres-erbsen commented 1 year ago
JasonGross commented 1 year ago
To shed a bit more light on this If you go to the PR at https://github.com/mit-plv/bedrock2/pull/280, we see ![image](https://user-images.githubusercontent.com/396076/193449960-c6d52f5f-dbf8-488c-abd3-ad56abbd203f.png) and the [actions log](https://github.com/mit-plv/bedrock2/actions/workflows/coq.yml) shows ![image](https://user-images.githubusercontent.com/396076/193450049-a4da5e95-adbe-4ee1-a585-330c764b5be7.png) so it looks like the overall status of the commit includes runs of that commit across all branches, but the displayed checks are only for master or something?
JasonGross commented 1 year ago

I'm thinking this might actually be an OOM kill, especially given that we're running the checker while replacing vm_compute with the lazy reduction machine.

JasonGross commented 1 year ago

Plausibly related: https://github.com/actions/runner-images/issues/736#issuecomment-647069731 (but this is only about mac)

Maybe worth asking about at https://github.com/orgs/community/discussions/

andres-erbsen commented 1 year ago

https://github.com/mit-plv/fiat-crypto/actions/runs/3242704485/jobs/5316358457

andres-erbsen commented 1 year ago

https://github.com/mit-plv/fiat-crypto/actions/runs/3245312644/jobs/5322679386

Cloning into 'boringssl'...
  error: 44224 bytes of body are still expected
  fetch-pack: unexpected disconnect while reading sideband packet
  fatal: early EOF
  fatal: fetch-pack: invalid index-pack output
  + exit 128
andres-erbsen commented 1 year ago

https://github.com/mit-plv/fiat-crypto/actions/runs/3245312653/jobs/5322679307

JasonGross commented 1 year ago

https://github.com/mit-plv/fiat-crypto/actions/runs/3245312653/jobs/5322679307

This is new

# ocamlfind: Package `findlib' not found
# Error while running 'D:\a\fiat-crypto\fiat-crypto\_opam\bin/ocamlfind.exe query findlib -format %v' (exit code 2)
# Configuration script failed!

I've seen a bunch of errors about dynlink failing, but not this one

JasonGross commented 1 year ago

We should include snippets of the logs in the comments; GH deletes logs after a couple months

JasonGross commented 1 year ago

https://github.com/mit-plv/fiat-crypto/actions/runs/3242704485/jobs/5316358457

COQFLAGS="-Q src/bedrock2 bedrock2 -Q src/bedrock2Examples bedrock2Examples -Q /github/workspace/rupicola/bedrock2/deps/coqutil/src/coqutil coqutil " ../etc/bytedump.sh bedrock2.PrintListByte.allBytes > special/BytedumpTest.out.tmp
make[2]: *** [Makefile:64: special/BytedumpTest.out] Error 1
make[1]: *** [Makefile:105: bedrock2_ex] Error 2

How do we debug this? Maybe we shouldn't suppress stderr in bytedump?

JasonGross commented 1 year ago

validate failed on bytedump after previous steps succeeded: https://github.com/mit-plv/fiat-crypto/actions/runs/3247910630/jobs/5328487093#step:17:3485

JasonGross commented 1 year ago

In fact bytedump is failing very frequently now, can we do something about this? https://github.com/mit-plv/fiat-crypto/actions/runs/3247892148/jobs/5328446887 https://github.com/mit-plv/fiat-crypto/actions/runs/3247910630/jobs/5328487093 https://github.com/mit-plv/fiat-crypto/actions/runs/3247916390/jobs/5328499981

JasonGross commented 1 year ago

This suspiciously started happening when I started not trying to rebuild on Mac OS

JasonGross commented 1 year ago

Can anyone reproduce the (seemingly consistent?) issue that make -j2 fails on Mac OS the first time it's run after a clean checkout? c.f. https://github.com/mit-plv/fiat-crypto/pull/1458

andres-erbsen commented 1 year ago
# Usage: coqworker.opt --kind=[proof|query|tactic] $args
# got -main-channel
# File ".\src/Curves/Edwards/XYZT/Basic.v", line 124, characters 6-336:
# Error:
# Anomaly
# "Uncaught exception Failure("The spawned process did not connect back in 10.0s")."
# Please report at [http://coq.inria.fr/bugs/.](http://coq.inria.fr/bugs/)
# 
# make: *** [Makefile.coq:793: src/Curves/Edwards/XYZT/Basic.vo] Error 129

https://github.com/mit-plv/fiat-crypto/actions/runs/3281473086/jobs/5403532417#step:17:1812

andres-erbsen commented 1 year ago

https://github.com/mit-plv/bedrock2/actions/runs/3292916290/jobs/5428839023 self-cancelled on start

JasonGross commented 1 year ago

Sometimes the version of Coq changes between one step and the next in the docker action https://github.com/coq-community/docker-coq-action/issues/80 https://github.com/mit-plv/fiat-crypto/actions/runs/3287380197/jobs/5416490558#step:17:501

This results in a bytedump failure because the bedrock2 files are compiled with an outdated version of Coq and so we get inconsistent assumptions

andres-erbsen commented 1 year ago

https://github.com/mit-plv/bedrock2/actions/runs/3305594557/jobs/5455825957 self-cancelled on submit

andres-erbsen commented 1 year ago

https://github.com/mit-plv/bedrock2/actions/runs/3337004947/jobs/5522821413 self-cancelled on submit https://github.com/mit-plv/bedrock2/actions/runs/3337008636/jobs/5522829175 self-cancelled on submit

andres-erbsen commented 1 year ago

https://github.com/mit-plv/bedrock2/actions/runs/3352143850/jobs/5554084845 self-cancelled on submit

andres-erbsen commented 1 year ago

https://github.com/mit-plv/bedrock2/actions/runs/3390987808/jobs/5635703710 self-cancelled on submit

andres-erbsen commented 1 year ago

gcc: fatal error: no input files https://github.com/mit-plv/fiat-crypto/actions/runs/3420869283/jobs/5696244304#step:17:453

JasonGross commented 1 year ago

gcc: fatal error: no input files https://github.com/mit-plv/fiat-crypto/actions/runs/3420869283/jobs/5696244304#step:17:453

That's probably just me messing up how to get the version of gcc; I guess it's not gcc -v? The line to be fixed is https://github.com/mit-plv/fiat-crypto/blob/5047e39e77c2124eadb5b46c7d870e88baa16b39/etc/ci/describe-system-config.sh#L26

The actual failure here is

COQFLAGS="-Q src/bedrock2 bedrock2 -Q src/bedrock2Examples bedrock2Examples -Q /github/workspace/rupicola/bedrock2/deps/coqutil/src/coqutil coqutil " ../etc/bytedump.py bedrock2.PrintListByte.allBytes > special/BytedumpTest.out.tmp

Coq < Coq < Toplevel input, characters 1-55:
> Require bedrock2.PrintListByte bedrock2.PrintListByte.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error:
Compiled library bedrock2.PrintListByte (in file /github/workspace/rupicola/bedrock2/bedrock2/src/bedrock2/PrintListByte.vo) makes inconsistent assumptions over library Coq.Init.Ltac

Coq < 
Coq < 
Unnamed_thm < 
Unnamed_thm < 
Unnamed_thm < Toplevel input, characters 32-63:
>   PrintListByte.print_list_byte bedrock2.PrintListByte.allBytes.
>                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error: The reference bedrock2.PrintListByte.allBytes was not found in the current environment.

Unnamed_thm < 
Coq < 
Error: make[2]: *** [Makefile:64: special/BytedumpTest.out] Error 2
Error: make[1]: *** [Makefile:105: bedrock2_ex] Error 2
make: *** No rule to make target 'bedrock2', needed by '.Makefile.coq.d'.  Stop.

which presumably is https://github.com/mit-plv/fiat-crypto/issues/1394#issuecomment-1286231004

JasonGross commented 1 year ago

https://github.com/mit-plv/fiat-crypto/actions/runs/3446654504/jobs/5751826046 Windows: Retrieve the Cygwin cache Error: getaddrinfo ENOTFOUND www.cygwin.com

JasonGross commented 1 year ago

https://github.com/mit-plv/fiat-crypto/actions/runs/3457159795/jobs/5770405148#step:6:116

** Fatal error: Cannot run cygpath -m "libshell32" "libshell32.lib" "libshell32.dll.a" "libshell32.a" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\libshell32" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\libshell32.lib" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\libshell32.dll.a" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\libshell32.a" "lib\libshell32" "lib\libshell32.lib" "lib\libshell32.dll.a" "lib\libshell32.a" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\flexdll\libshell32" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\flexdll\libshell32.lib" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\flexdll\libshell32.dll.a" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\flexdll\libshell32.a" "/usr/lib/gcc/x86_64-w64-mingw32/11/libshell32" "/usr/lib/gcc/x86_64-w64-mingw32/11/libshell32.lib" "/usr/lib/gcc/x86_64-w64-mingw32/11/libshell32.dll.a" "/usr/lib/gcc/x86_64-w64-mingw32/11/libshell32.a" "/usr/x86_64-w64-mingw32/lib/x86_64-w64-mingw32/11/libshell32" "/usr/x86_64-w64-mingw32/lib/x86_64-w64-mingw32/11/libshell32.lib" "/usr/x86_64-w64-mingw32/lib/x86_64-w64-mingw32/11/libshell32.dll.a" "/usr/x86_64-w64-mingw32/lib/x86_64-w64-mingw32/11/libshell32.a" "/usr/x86_64-w64-mingw32/lib/libshell32" "/usr/x86_64-w64-mingw32/lib/libshell32.lib" "/usr/x86_64-w64-mingw32/lib/libshell32.dll.a" "/usr/x86_64-w64-mingw32/lib/libshell32.a" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/x86_64-w64-mingw32/11/libshell32" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/x86_64-w64-mingw32/11/libshell32.lib" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/x86_64-w64-mingw32/11/libshell32.dll.a" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/x86_64-w64-mingw32/11/libshell32.a" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/libshell32" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/libshell32.lib" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/libshell32.dll.a" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/libshell32.a"
# File "caml_startup", line 1:
# Error: Error during linking (exit code 2)
# make[1]: *** [Makefile.common:184: _build/default/coq-core.install] Error 1
# make[1]: *** Waiting for unfinished jobs....
# make[1]: Leaving directory '/cygdrive/d/a/fiat-crypto/fiat-crypto/_opam/.opam-switch/build/coq.8.15.2'
# make: *** [Makefile.make:122: submake] Error 2
andres-erbsen commented 1 year ago
   2 |  opam exec -- cat ~/.opam/log/*
     |  ~~~~
     | The term 'opam' is not recognized as a name of a cmdlet, function, script file, or executable program. Check the
     | spelling of the name, or if a path was included, verify that the path is correct and try again.

https://github.com/mit-plv/fiat-crypto/actions/runs/3470364042/jobs/5798475008

JasonGross commented 1 year ago
   2 |  opam exec -- cat ~/.opam/log/*
     |  ~~~~
     | The term 'opam' is not recognized as a name of a cmdlet, function, script file, or executable program. Check the
     | spelling of the name, or if a path was included, verify that the path is correct and try again.

https://github.com/mit-plv/fiat-crypto/actions/runs/3470364042/jobs/5798475008

This is because I have the cat step set to run even if earlier steps failed. The actual issue here is that opam failed to install in an earlier step with

connect ETIMEDOUT 185.199.110.133:443
  Waiting 20 seconds before trying again
  connect ETIMEDOUT 185.199.110.133:443
  Waiting 14 seconds before trying again
  Error: connect ETIMEDOUT 185.199.110.133:443
JasonGross commented 1 year ago

validate times out in https://github.com/mit-plv/fiat-crypto/actions/runs/3470357503/jobs/5798461079 and https://github.com/mit-plv/fiat-crypto/actions/runs/3470364029/jobs/5798476434 because changed Coq version between steps results in project rebuild. We should really have a better way of dealing with Coq changing versions under us... (no progress on https://github.com/coq-community/docker-coq-action/issues/80 yet...)

JasonGross commented 1 year ago

boringssl test failure at https://github.com/mit-plv/fiat-crypto/actions/runs/3524875493/jobs/5910863477#step:7:111 on a PR changing exact _ to refine _ in a Qed'd proof (aka a PR that cannot have changed anything impacting BoringSSL)

 FAILED: crypto_test_data.cc /home/runner/work/fiat-crypto/fiat-crypto/boringssl/build/crypto_test_data.cc 
  cd /home/runner/work/fiat-crypto/fiat-crypto/boringssl && /opt/hostedtoolcache/go/1.13.15/x64/bin/go run util/embed_test_data.go -file-list /home/runner/work/fiat-crypto/fiat-crypto/boringssl/build/embed_test_data_args.txt > /home/runner/work/fiat-crypto/fiat-crypto/boringssl/build/crypto_test_data.cc
  # command-line-arguments
  Error: util/embed_test_data.go:79:16: undefined: os.ReadFile
  Error: util/embed_test_data.go:129:16: undefined: os.ReadFile
JasonGross commented 1 year ago

Package g++-7 is not available, but is referred to by another package. This may mean that the package is missing, has been obsoleted, or is only available from another source

E: Package 'g++-7' has no installation candidate

https://github.com/mit-plv/fiat-crypto/actions/runs/3524875493/jobs/5936926857#step:3:67

andres-erbsen commented 1 year ago

https://github.com/mit-plv/bedrock2/actions/runs/3653687403/jobs/6176913824 timed out for no reason I can see

JasonGross commented 1 year ago

https://github.com/mit-plv/bedrock2/actions/runs/3653687403/jobs/6176913824 timed out for no reason I can see

Windows does this sometimes, I think: it fails to quit terminal after all scripts complete

JasonGross commented 1 year ago

https://github.com/mit-plv/fiat-crypto/actions/runs/4151994561/jobs/7182649901 connection error: 12037 fetching https://cygwin.mirror.constant.com/x86_64/setup.bz2.sig

andres-erbsen commented 7 months ago

https://github.com/mit-plv/fiat-crypto/actions/runs/7700820211/job/20985462543?pr=1811

error: failed retrieving file 'ocaml-compiler-libs-5.1.0-1-x86_64.pkg.tar.zst.sig' from geo.mirror.pkgbuild.com : Operation too slow. Less than 1 bytes/sec transferred the last 10 seconds
 coq-8.18.0-2-x86_64 downloading...
error: failed retrieving file 'ghc-9.0.2-3-x86_64.pkg.tar.zst.sig' from geo.mirror.pkgbuild.com : Operation too slow. Less than 1 bytes/sec transferred the last 10 seconds
warning: failed to retrieve some files
error: failed to commit transaction (unexpected error)
andres-erbsen commented 7 months ago
Connecting to gitlab.alpinelinux.org (172.105.69.85:443)
wget: server returned error: HTTP/1.1 404 Not Found
ssl_client: write: Broken pipe
Error: Process completed with exit code 1.

https://github.com/mit-plv/rewriter/actions/runs/7705023867/job/20998315736?pr=148

andres-erbsen commented 4 months ago

https://launchpad.net/api/devel/~jgross-h/+archive/ubuntu/coq-master-daily?ws.op=getSigningKeyData returns HTTPError

https://github.com/mit-plv/bedrock2/actions/runs/8742444427/job/23990710611

andres-erbsen commented 4 months ago

error: failed retrieving file 'perl-5.38.2-1-x86_64.pkg.tar.zst.sig' from geo.mirror.pkgbuild.com : Operation too slow. Less than 1 bytes/sec transferred the last 10 seconds https://github.com/mit-plv/fiat-crypto/actions/runs/8750960028/job/24015572653?pr=1885

JasonGross commented 3 months ago

https://github.com/mit-plv/fiat-crypto/actions/runs/9056914029/job/24884198262#step:7:29

Run jirutka/setup-alpine@v1
Run sudo -E ./setup-alpine.sh
Prepare rootfs directory
Download static apk-tools
  ▷ Downloading https://gitlab.alpinelinux.org/api/v4/projects/5/packages/generic/v2.14.0/x86_64/apk.static
  curl: (7) Failed to connect to gitlab.alpinelinux.org port 443 after 9945 ms: Connection timed out

  Error occurred at line 176:
    173 | APK="$RUNNER_TEMP/apk"
    174 | 
    175 | info "Downloading ${INPUT_APK_TOOLS_URL%\#*}"
  > 176 | download_file "$INPUT_APK_TOOLS_URL" "$APK"
    177 | chmod +x "$APK"
    178 | 
    179 | 
  Error: Error occurred at line 176: download_file "$INPUT_APK_TOOLS_URL" "$APK" (see the job log for more information)
  Error: Process completed with exit code 1.
JasonGross commented 3 months ago
[1 of 2] Compiling Main             ( src/ExtractionHaskell/bedrock2_fiat_crypto.hs, src/ExtractionHaskell/bedrock2_fiat_crypto.o )
panic! (the 'impossible' happened)
  GHC version 9.10.1:
    heap overflow

Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug

https://github.com/mit-plv/fiat-crypto/actions/runs/9138561025/job/25130899016?pr=1913#step:4:4949

Maybe we need to bump -M7G to -M8G? (And maybe we should upstream a bug report to not call heap overflows impossible?)

andres-erbsen commented 2 months ago

E: Tried to extract package, but file already exists. Exit...

JasonGross commented 2 months ago

And maybe we should upstream a bug report to not call heap overflows impossible?

I've reported https://gitlab.haskell.org/ghc/ghc/-/issues/24927 and https://gitlab.haskell.org/ghc/ghc/-/issues/20683 about the memory issues

JasonGross commented 1 month ago
[intern /home/coq/.opam/4.13.1+flambda/lib/coq/theories/btauto/Reflect.vo ...
Context: vmlibrary/fld=1/vm_to_patch/fld=3/vm_non_subst_reloc/tag=8/fld=0/vm_caml_prim/tag=0
#0(#0(#0(1,1),#0(0,0),#0(1,0),#0(2,2),#0(3,2),#0(4,1),#0(5,2),#0(6,3)),1,9)
 failed!]

Fatal Error: Failure: Validation failed: sum: unexpected tag (in tag=0).
             Please report at [http://coq.inria.fr/bugs/.](http://coq.inria.fr/bugs/)
Command exited with non-zero status 129

https://github.com/mit-plv/fiat-crypto/actions/runs/10189997626/job/28191187326#step:8:3829 logs_26665157425.zip https://github.com/coq/coq/issues/19455

JasonGross commented 3 weeks ago

The Debian CI has started dying with Error: Process completed with exit code 137. And no other info https://github.com/mit-plv/fiat-crypto/actions/runs/10401520403/job/28804231157

OCAMLC src/ExtractionJsOfOCaml/bedrock2_fiat_crypto.ml -o src/ExtractionJsOfOCaml/bedrock2_fiat_crypto.byte
Warning: Stack size (16384) may be too small.
Warning: To avoid stack overflows in OCaml compilation, setting stack size to the recommended minimum of 66520 kbytes
src/ExtractionJsOfOCaml/bedrock2_fiat_crypto.byte (real: 25.09, user: 22.67, sys: 2.15, mem: 3301724 ko)
OCAMLC src/ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.ml -o src/ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.byte
Warning: Stack size (16384) may be too small.
Warning: To avoid stack overflows in OCaml compilation, setting stack size to the recommended minimum of 66520 kbytes
src/ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.byte (real: 25.49, user: 22.96, sys: 2.17, mem: 3301512 ko)
Error: Process completed with exit code 137.