hylo-lang / hylo

The Hylo programming language
https://www.hylo-lang.org
Apache License 2.0
1.16k stars 55 forks source link

Nondeterministic test failure #1130

Open dabrahams opened 8 months ago

dabrahams commented 8 months ago

I've seen this happen several times (without parallel testing enabled FWIW):

Test Case '-[LibraryTests.LibraryTests test_compileAndRun_OptionalTests]' started.
/Users/dave/src/hylo/Tests/LibraryTests/TestCases/OptionalTests.hylo:1: error: -[LibraryTests.LibraryTests test_compileAndRun_OptionalTests] : success was expected, but processing failed with thrown error: NonzeroExit(
  terminationStatus: 6,
  standardOutput: "/Users/dave/src/hylo/Tests/LibraryTests/TestCases/OptionalTests.hylo:10: precondition failure\n",
  standardError: "",
  commandLine: ["/var/folders/f3/48cjbvxd0sx81ld0qm__zqrc0000gn/T/83CF26FA-3963-4623-AE36-1256AC0165C9"])
Test Case '-[LibraryTests.LibraryTests test_compileAndRun_OptionalTests]' failed (6.646 seconds).

It's just this one test, as far as I can tell.

dabrahams commented 8 months ago

This seems to happen a bit more reliably in CI when parallel testing is enabled, FWIW.

dabrahams commented 8 months ago

Here's another example: https://github.com/hylo-lang/hylo/actions/runs/6802288428/job/18495069171#step:4:3081 Parallel testing happens to be off in this configuration

kyouko-taiga commented 8 months ago

Looks like a very nasty bug. I've been completely unable to reproduce it on my machine so far. The thing about parallel testing sounds like a red herring, as I suspect the real culprit is UB.

I'll have to get my hands on Hylo IR and LLVM IR to understand what's going on.

dabrahams commented 8 months ago

I suggest reducing the test case first if possible.

What architecture is your machine?

kyouko-taiga commented 8 months ago

I'll have to get my hands on Hylo IR and LLVM IR to understand what's going on.

LLVM IR looks good to me if I compile on my machine. I am not 100% positive because there's a lot of noise due to unnecessary pointers to Void allocas, as mentioned yesterday.

I worry that we may have to insert debug info into the executable to investigate further with lldb.

I suggest reducing the test case first if possible.

Good idea. I tried on my machine but not on CI.

What architecture is your machine?

I use an Apple M1. Am I correct to assume the bug only occurs on Ubuntu? I think CI uses x86 for this OS, but I am not sure.