smol-rs / concurrent-queue

Concurrent multi-producer multi-consumer queue
Apache License 2.0
254 stars 22 forks source link

weird test failures on s390x-unknown-linux-gnu with rustc + llvm 13 only #21

Closed decathorpe closed 2 years ago

decathorpe commented 2 years ago

For context: I'm the primary package maintainer for Rust crates in Fedora Linux. We're running test suites for all crates, where possible, in an effort to improve the quality of the packages that we ship.

I came across a very weird test failure in concurrent-queue 1.2.3 and 1.2.4:

running 10 tests
test close ... ok
test capacity ... ok
test len ... ok
test len_empty_full ... ok
test linearizable ... ok
test mpmc ... FAILED
test smoke ... ok
test drops ... ok
test zero_capacity - should panic ... ok
test spsc has been running for over 60 seconds

And then the test runner is stuck forever.

This problem with the mpmc and spsc tests is limited to s390x, and only when the crate is compiled with Rust on LLVM 13 (such as on Fedora 35, where LLVM 14 is not available).

I cannot determine for certain which change might have started to cause this problem, because we don't include s390x in our CI, due to limited builder capacity, but I can provide these data points:

Last known successful build:

First known unsuccessful build:

But this is also successful:

taiki-e commented 2 years ago

concurrent-queue 1.2.3

1.2.3 has been yanked due to a bug: https://github.com/smol-rs/concurrent-queue/blob/master/CHANGELOG.md#version-123 Please try with 1.2.4.

taiki-e commented 2 years ago

Oh, sorry, you already said 1.2.4 has the same issue.

I came across a very weird test failure in concurrent-queue 1.2.3 and 1.2.4:

taiki-e commented 2 years ago

I cannot find any changes between 1.2.2 and 1.2.4 that might affect s390x or big-endian targets: https://github.com/smol-rs/concurrent-queue/compare/v1.2.2...v1.2.4

So, the rustc version or the distro version may be relevant. Actually, the rustc 1.60+ changes its behavior on s390x depending on the LLVM version: https://github.com/rust-lang/rust/pull/94764/files#diff-9f3bfd2fdc228dcb1b352082a7ebc39b19c8bea15b73141f72d9f5021aa3c66c

rust version: 1.58.1 built against llvm 13 build host: Fedora 35 running Linux 5.15.6

rust version: 1.62.0 built against llvm 13 build host: Fedora 36 running Linux 5.18.9

decathorpe commented 2 years ago

So, the rustc version or the distro version may be relevant. Actually, the rustc 1.60+ changes its behavior on s390x depending on the LLVM version

Right, that could be a possible culprit :( I'll try to follow up with Red Hat Rust / LLVM maitnainers.

taiki-e commented 2 years ago

Closing as an upstream bug per https://github.com/smol-rs/concurrent-queue/issues/21#issuecomment-1197570359.