llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.26k stars 11.67k forks source link

Nondeterminism in llvm-libc++-static.cfg.in :: std/time/time.zone/time.zone.timezone/time.zone.members/sys_info.zdump.pass.cpp #89629

Open ilovepi opened 5 months ago

ilovepi commented 5 months ago

We're seeing some non-deterministic failures in llvm-libc++-static.cfg.in :: std/time/time.zone/time.zone.timezone/time.zone.members/sys_info.zdump.pass.cpp in our CI (https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8750473375533143361/overview).

It's unclear why this is happening, and why sometimes this test passes w/o any (known) changes to our CI configuration/infrastructure or to libcxx.

This appears related to https://github.com/llvm/llvm-project/commit/1fda1776e32b5582bfcfcbd8094f3c280d936cec, so I'm CCing @mordante.

Below is the failing diff, which are only different in 2 lines.

# | libcxx
# | ...
# | America/Ciudad_Juarez  Sun Feb  5 06:59:59 1939 UT = Sat Feb  4 23:59:59 1939 MST isdst=0 gmtoff=-25200
# | America/Ciudad_Juarez  Sun Feb  5 07:00:00 1939 UT = Sun Feb  5 01:00:00 1939 CST isdst=0 gmtoff=-21600
# | ...

vs.

# | zdump
# | ...
# | America/Ciudad_Juarez  Fri Apr  1 06:59:59 1932 UT = Thu Mar 31 23:59:59 1932 MST isdst=0 gmtoff=-25200
# | America/Ciudad_Juarez  Fri Apr  1 07:00:00 1932 UT = Fri Apr  1 01:00:00 1932 CST isdst=0 gmtoff=-21600
# | ...
llvmbot commented 5 months ago

@llvm/issue-subscribers-bug

Author: Paul Kirth (ilovepi)

We're seeing some non-deterministic failures in llvm-libc++-static.cfg.in :: std/time/time.zone/time.zone.timezone/time.zone.members/sys_info.zdump.pass.cpp in our CI (https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8750473375533143361/overview). It's unclear why this is happening, and why sometimes this test passes w/o any (known) changes to our CI configuration/infrastructure or to libcxx. This appears related to https://github.com/llvm/llvm-project/commit/1fda1776e32b5582bfcfcbd8094f3c280d936cec, so I'm CCing @mordante. Below is the failing diff, which are only different in 2 lines. ``` # | libcxx # | ... # | America/Ciudad_Juarez Sun Feb 5 06:59:59 1939 UT = Sat Feb 4 23:59:59 1939 MST isdst=0 gmtoff=-25200 # | America/Ciudad_Juarez Sun Feb 5 07:00:00 1939 UT = Sun Feb 5 01:00:00 1939 CST isdst=0 gmtoff=-21600 # | ... ``` vs. ``` # | zdump # | ... # | America/Ciudad_Juarez Fri Apr 1 06:59:59 1932 UT = Thu Mar 31 23:59:59 1932 MST isdst=0 gmtoff=-25200 # | America/Ciudad_Juarez Fri Apr 1 07:00:00 1932 UT = Fri Apr 1 01:00:00 1932 CST isdst=0 gmtoff=-21600 # | ... ```
mordante commented 5 months ago

Thanks for the report. This looks really odd. These tests should be deterministic. The log displayed on the website is truncated and I didn't see a link to download the entire log.

Did I miss the link? Or can you share the log?

ilovepi commented 5 months ago

https://logs.chromium.org/logs/fuchsia/buildbucket/cr-buildbucket/8750473375533143361/+/u/clang/test/stdout

The full logs are listed below under the test step.

ilovepi commented 5 months ago

Also, it seems like its possible this was some transient thing in our infrastructure, since the test hasn't failed in a while. Why don't we leave this open for another day, and I can close it tomorrow if we don't see more issues? I d close it now, but we've had a few periods where the tests passed 2-4 times in a row an started failing again, so I think I'll give it a bit more time.

For your investigation, I'd think you're likely fine to ignore this for the time being. I'll be sure to update you if this crops up again.

mordante commented 5 months ago

I had a quick look. The zdump information on my system matches the output of your zdump. I manually verified the contents of the database and that too matches what zdump does. So if this is a real issue; libc++ is wrong.

Based on your comment above I'll ignore the issue for now. Please let me know if the problem persists.

ilovepi commented 5 months ago

Well, it still seems to be happening. I've asked our infra folks and the bots in this pool should be completely stable (same OS versions, packages, VM images, etc.). We suspected this was an issue w/ the region, but have also ruled that out, since passes and failures happen randomly across regions. The diffs from the latest failure (https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8749770631515774145/overview) also look the same as before, so at least the failure mode is consistent.

Do you have any ideas about what would be wrong here?

ilovepi commented 5 months ago

Oh, just an FYI, our bots use the just built compiler to build libc++. Is it possible there is some nondeterminism in clang itself that was introduced recently? It seems odd that it would only affect this one test though... but that's about the only thing I can think of ATM.

ilovepi commented 5 months ago

@mordante We're still seeing this happen with some regularity. I took a deeper look at the patch, and have a suspicion that there may be something subtly wrong w/ some of the merging logic. I'm not that sure though, since it seems most of that logic is driven by errata in the spec.

Have you had any success reproducing this? With nondeterminism, I've had success using LD_PRELOAD to change the default allocator, since in the compiler its usually related to something ordered by pointer value being iterated over. That doesn't seem to be the case here, from a quick look, but I'd be remiss if I didn't share the only thing I've found to make non-determinism bugs reproduce more reliably.

ilovepi commented 5 months ago

@mordante We've had to disable this test in our CI, since it's been flaky. We'd like to remove that workaround in our toolchain build ASAP. Have you had success reproducing the nondeterminism? our CI uses a stock debian 11.8 image as the base, so I wouldn't expect there to be issues w/in the image itself, and other than using the runtimes build and therefore the latest compiler, I'm not aware of anything in our build/test environment that is significant. Further, other than some kind of pointer ordering making merge decisions nondeterministic, I can't see how we'd consistently get a single pair of Date/Times wrong.

Do you have time to look into this? I feel like this is going to become a larger issue once more people start using the latest libc++ revisions.

mordante commented 5 months ago

I have been quite busy last week, so I had little time to look into this. I did some tests, but I can't reproduce the issue locally (using Debian 12). Also during development I executed this test quite often without "random" issue. (Obviously I ran into issues, but they were deterministic and were errors in the implementation.) Did you happen to investigate whether all failures are with the same time zone? If you didn't investigate can you provide links to additional failures.

ilovepi commented 5 months ago

Yes, I've gone through all the logs for both the Aarch64 and x86_64 builds and we always see the same diff(the one posted above).

We had something similar happen w/ debug info a while back. In that case two entries were reordered based on a iteration order of some container. The debug info wasn't wrong in that case though, just ordered differently. Several maintainers ran our test for hours w/o reproduction, but our bots would reliably repro it, almost every time. Ultimately, I got it to start failing deterministically by running once w/ the system allocator, and then LD_PRELOADing a different allocator, like mimalloc or tcmalloc. That brought the number of iterations down from 1000's to under 10 (usually just 1).

Another thought is that maybe its possible to swap one of the internal containers w/ SmallVector or equivalent and use the expensive checking mechanisms (reverse iteration or randomizing the order) to see if the results come out the same. It may be just as good to run the libc++ part of the test in a loop, until it doesn't match anymore...

I'm not that familiar with your patch, but from what I can tell, just about the only place this could crop up is in the merging logic, since I think that's the only way you'd interpret a value differently. It's entirely possible I'm wrong on that detail though.

I've set aside some time tomorrow to try and reproduce the issue w/ the above approaches.