Open orhun opened 1 year ago
Hmm, interesting. I can't tell offhand what is causing those, but probably something to do with threading; the panic_guard
tests should not run in parallel (they modify the global panic hook), so they use a mutex.
About the first failure, is it somehow possible that files in Alpine are not case-sensitive? Although that sounds like something very fundamental to Unix that shouldn't be different.
ping @nekopsykose (Alpine Linux maintainer), maybe they know something about it :/
About the first failure, is it somehow possible that files in Alpine are not case-sensitive?
that's a filesystem thing and nothing in this case has casefolding
running them locally they all pass, funny..
ah, it's flaky. it toggles between:
failures:
panic_guard::tests::test_nested_callback_hook_restored
panic_guard::tests::test_nested_callback_with_panic
failures:
panic_guard::tests::test_callback_called_before_panic_hook
panic_guard::tests::test_callback_called_on_drop
panic_guard::tests::test_callback_called_once_only_panic
panic_guard::tests::test_nested_callback
panic_guard::tests::test_nested_callback_hook_restored
panic_guard::tests::test_nested_callback_with_panic
this just sounds like some ordering that is broken
ah. the reason they cascade like that is because your mutex doesn't unlock on fail. so whatever test fails first, causes the rest to fail on lock (poisoned dead mutex)
the first failure is always
---- panic_guard::tests::test_callback_called_before_panic_hook stdout ----
thread 'panic_guard::tests::test_callback_called_before_panic_hook' panicked at 'assertion failed: `(left == right)`
left: `["cleanup", "hook", "hook", "hook"]`,
right: `["cleanup", "hook"]`', src/panic_guard.rs:135:9
--- panic_guard::tests::test_nested_callback_hook_restored stdout ----
thread 'panic_guard::tests::test_nested_callback_hook_restored' panicked at 'assertion failed: `(left == right)`
left: `["inner inner cleanup", "inner cleanup", "outer hook", "outer hook"]`,
right: `["inner inner cleanup", "inner cleanup", "outer hook"]`', src/panic_guard.rs:214:9
etc. seems that something races appending
i don't know how to fix that, but maybe you have a clue.
your mutex doesn't unlock on fail
the default Mutex
cannot unlock on a thread panic (what your failures do), it always gets poisoned. you should see if there's another implementation :)
Thanks! Yeah, the poisoning is kind of on purpose, if it happens then something is wrong anyway so I didn't bother cleaning it up.
I don't really know what's happening there that's causing the callbacks to be called multiple times. Maybe I'm doing something stupid with the locking/calls
mutex inside the callbacks.
Yeah, the poisoning is kind of on purpose, if it happens then something is wrong anyway so I didn't bother cleaning it up.
that makes sense. just a little noisy, but the first failure is all that mattered anyway.
Could this be resolved now? I believe the panic_guard
tests shouldn't fail anymore after an update to rustc
, and app_state::tests::test_case_sensitive_mode_change
should have been fixed in 8cadf56d5a835da844c70c54f026a22f6c640c8f.
Are you planning to create a release soon? It would be nice to activate the tests again along with it 😊
Ah yes I didn't realize that I would need to make a new release. However, these panic guard test failures still seem to appear with rustc version 1.77, both on develop
and master
:weary: I will really need to try and address this.
I can only reproduce this on my work computer (ubuntu 22) but not on my laptop (fedora 39), and it only happens occasionally. It happens a little bit more often if I force a re-link by removing the unit test binary, like
rm target/debug/deps/tere-25a48097920d5416 && cargo test
(the name of the binary is shown in the first few lines of the output of cargo test)
This suggests that it can have something to do with the CPU being busy, which would make sense as this seems to be due to a race condition.
@timon-schelling re: https://github.com/mgunyho/tere/pull/104#issuecomment-2343019105
Do you know what the environment (distro / rustc version) in the codespace is? I would like to investigate this if it can be reliably reproduced. I tried now on Debian 11 (in a container) and there at least the issue doesn't appear.
I tried this also with an Alpine Linux container (using Distrobox) and I can't reproduce it there either (on my laptop, which I mentioned above that I couldn't reproduce anyway).
Okay, I was able to confirm that it's a data race issue by running the thread sanitizer. I installed the nightly toolchain, and then run with
RUSTFLAGS="-Zsanitizer=thread" cargo +nightly test --target=x86_64-unknown-linux-gnu
and it reports data races in those tests. That shouldn't happen in safe Rust...
I'm unable to reproduce this, tried on multiple codespaecs and my nix machine. But the codespace I head the issue on head rustc 1.83.0-nightly (c2f74c3f9 2024-09-09).
and it reports data races in those tests. That shouldn't happen in safe Rust...
I guess a library that is using some unsafe code dose not have a save api?
RUSTFLAGS="-Zsanitizer=thread" cargo +nightly test --target=x86_64-unknown-linux-gnu
But I don't see any non std or tere line in the warning. 🤔
The panic_guard
module only uses std::sync::Arc
and Mutex
. I don't exactly know how the test are run, but I suppose messing with the panic hook in multiple tests running in parallel is pretty unusual, and maybe during a panic (where I'm doing stuff) some of the safety guarantees don't apply.
But I will try to investigate with an MWE to see if this is truly a bug in std.
Hello! 🐻
I've been trying to package
tere
for Alpine Linux and I created a merge request here: https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/49128As you can see from the failed pipelines, some tests are failing:
Also, I get the following failures even though I skip that one test:
I am skipping these tests for now since I am not sure what is happening there.
Weirdly enough, all the tests (including those ones) pass on Arch Linux.
Let me know if I can provide more info to debug this.
Thanks for this great tool! 💯