Open jschwe opened 4 years ago
I found the race to you log and fix it with 29452bb, but their is still a race in our code.
I'm bumping this issue, with an update.
When running rusty-demo via qemu-system-x86_64 -cpu qemu64,apic,fsgsbase,rdtscp,xsave,fxsr -display none -smp 1 -m 1G -serial stdio -kernel loader/target/x86_64-unknown-hermit-loader/debug/rusty-loader -initrd target/x86_64-unknown-hermit/debug/rusty_demo > log.log with smp values 1,2,3 and 4, and HERMIT_LOG_LEVEL_FILTER=Debug The log files get increasingly larger, with the main culprit being the laplace test (which uses rayon internally): |
# Cores | # lines | laplace time |
---|---|---|---|
1 | 15k | 14.2 s | |
2 | 37k | 18.3 s | |
3 | 86k | 31.5 s | |
4 | 135k | 51.0 s |
It looks like the tasks on the additional cpus get blocked very often. I'm not exactly sure in which part of rusty-demo this is, but this looks like it could be some sort of a synchronization problem.
log_single_core.log
log_2_core.log
log_3_core.log
log_4_core.log
Edit 1: Formatted table and added laplace time, since the output in the logfiles mostly originates from the laplace test.
Edit 2: Regarding performance of the laplace test, on linux in a VM it runs in 0.24s (4 cores) - 0.84s (1 core), so the test itself is okay. (Tested with taskset
and cargo run --target=x86_64-unknown-linux-gnu
in the examples/demo folder).
I just stumbled upon cross, which is maintained by the Tools team of the rust-embedded WG. In the Supported targets section they mention the following:
Also, testing is very slow.
cross test
runs units tests sequentially because QEMU gets upset when you spawn multiple threads. This means that, if one of your unit tests spawns threads, then it's more likely to fail or, worst, never terminate.
Did someone spot SMP related issues when using uhyve (or running bare-metal)? Otherwise, it might be worth investigating if our issue is possibly due to QEMU.
I never saw the issue, if we use KVM... Should we disable SMP tests at GitHub? At GitLab is nearly working, based on KVM and support SMP tests. Bors ist able to trigger these tests.
I moved all SMP tests to the GitLab Pipeline. This pipeline tests also the SMP support with Qemu, but it used KVM to accelerate the tests. We will see, if have still an issue on this platform. The pipeline runs only, if we use bors to test our kernel.
@jschwe Do you have an idea, why your integration tests aren't working on this pipeline.
@stlankes The output from the log seems really strange to me. In hermit_test_runner.py
one of the first actions is to print the passed executable argument directly after parsing the args. This does not happen for the second test, and I see no reason why this should be the case, unless there is some bug in pythons argparse (which seems unlikely).
The output worked fine for the first test (unit-tests) which where skipped as expected.
Another thing I noticed is that the total duration according to gitlab is 2h, while the single steps only add up to 1h, which I find strange. Could you maybe rerun the tests? We should probably move this into a seperate issue though, since its barely related.
I think that we fixed this issue. Should we close it?
Maybe we should add a note the the Readme before closing, saying that there are issues with multiple threads and QEMU.
I don't think we have an issue for this yet, so I'm opening one now to track this issue. Currently there are issues running rusty-demo with multiple cores, where sometimes it takes a long time to complete the demo.
Jobs failed due to SMP timeout (May 20 - June 06):
Possibly related Job logs:
thread 'main' panicked at 'Trying to get the scheduler for core 1, but it isn't available', src/scheduler/mod.rs:530:9
for https://github.com/hermitcore/rusty-hermit/commit/64ec710f8a683eeb0a5a6d3baa591fad794e4fe5