quantumlib / Stim

A fast stabilizer circuit library.
Apache License 2.0
310 stars 90 forks source link

Figure out why `test (256)` keeps failing #717

Closed Strilanc closed 4 months ago

Strilanc commented 4 months ago

Sometimes (only sometimes!) it gets stuck in an infinite loop of printing out address sanitizer errors. Very concerning, but only seems to happen for the 256-avx case so far. This one is currently not allowed to go into the python wheels due to a previous mysterious crash, so this isn't urgent, but it sure would be nice to know why these things are occurring.

image

Strilanc commented 4 months ago

Nevermind it has now also happened for test (64) and is therefore high priority.

image

Strilanc commented 4 months ago

Based on https://github.com/quantumlib/Stim/pull/718 this is a bug in gtest rather than a bug in stim. Reported it in https://github.com/google/googletest/issues/4491 .

kimwalisch commented 4 months ago

I have the same bug (AddressSanitizer:DEADLYSIGNAL) in my primecount project in the code below which only uses the math functions from the C++ standard library (my project does not use googletest):

  for (int i = 0; i < 100; i++)
  {
    T term = (Li(t) - x) * std::log(t);

    // Not converging anymore
    if (std::abs(term) >= std::abs(old_term))
      break;

    t -= term;
    old_term = term;
  }

My bug only occurs on Ubuntu 22.04 & 23.10 (x64) when running in a virtual machine and enabling the GCC/Clang sanitizers. When I switched my CI test to ubuntu-20.04 the bug disappeared. (When I tested using Ubuntu 22.04 & GCC sanitizers on a real server (no VM or Docker container) it also works without any issues)

After more than 2 hours of debugging I couldn't figure out the exact cause of the issue, but it looks like the issue is caused by a Ubuntu >= 22.04 bug or a compiler/sanitizer bug.


UPDATE 18/03/2024: Today I also tested on a Fedora 36 x64 VM using GCC and the same compiler options but I was not able to reproduce the issue. Hence the issue seems to only occur on Ubuntu x64 VMs (and possibly also on Debian VMs).

Strilanc commented 4 months ago

@kimwalisch Thanks, that's very helpful to know that I can work around it by pinning the version of ubuntu used by CI.

Strilanc commented 4 months ago

This seems to have been resolved externally. Hasn't happened in a PR for about a week now, whereas before it was happening multiple times per PR.