Closed ibc closed 3 months ago
OMG Rust tests fail because of this:
mediasoup-worker::mediasoup_worker_run() | settings error: unknown long option given as argument
I hate that getopt.h
ancient thing...
And BTW it fails due to this obviously wrong code that I'm fixing now, but it doesn't justify the problem:
if enable_liburing {
spawn_args.push("--disable_liburing".to_string());
}
Ok, this is done.
Why is this happening in CI????
https://github.com/versatica/mediasoup/actions/runs/10323061408/job/28579657794?pr=1442
cargo clippy
works in my machine and we force Rust version!!!!
OMG I don't want to spend all the weekend with this.
Ok, cargo clippy --all-targets -- -D warnings
fails in local too.
I suspect you might be forcing a different version locally. It wants you to write this idiomatic Rust instead:
let settings = WorkerSettings {
enable_liburing: false,
..WorkerSettings::default()
};
Why does it complain about this?
error: field assignment outside of initializer for an instance created with Default::default()
--> rust/src/router/tests.rs:21:13
|
21 | settings.enable_liburing = false;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
note: consider initializing the variable with `worker::WorkerSettings { enable_liburing: false, ..Default::default() }` and removing relevant reassignments
--> rust/src/router/tests.rs:20:13
|
20 | let mut settings = WorkerSettings::default();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is the code:
worker_manager
.create_worker({
let mut settings = WorkerSettings::default();
settings.enable_liburing = false;
settings
})
.await
.expect("Failed to create worker")
We do THE SAME in many other files such as here:
let worker = worker_manager
.create_worker({
let mut settings = WorkerSettings::default();
settings.log_level = WorkerLogLevel::Debug;
settings.log_tags = vec![WorkerLogTag::Info];
settings.rtc_port_range = 0..=9999;
settings.dtls_files = Some(WorkerDtlsFiles {
certificate: "tests/integration/data/dtls-cert.pem".into(),
private_key: "tests/integration/data/dtls-key.pem".into(),
});
settings.libwebrtc_field_trials =
Some("WebRTC-Bwe-AlrLimitedBackoff/Disabled/".to_string());
settings.app_data = AppData::new(CustomAppData { bar: 456 });
settings
})
.await
.expect("Failed to create worker with custom settings");
I suspect it has a heuristic about number of fields or just didn't learn to handle more complex cases, not sure which one of those two
Pushed, should be green now.
Generally what it suggests does look a bit nicer and even shorter, but not always
OMG now tests failing...
WTH???
https://github.com/versatica/mediasoup/actions/runs/10323816619/job/28581996102?pr=1442
process didn't exit successfully: `/home/runner/work/mediasoup/mediasoup/target/debug/deps/mediasoup-8f5b5278ff2f31a1` (signal: 11, SIGSEGV: invalid memory reference)
Memory issues... again :confused:
Memory corruption... again 😕
When did this happen in a test recently? I don't remember.
It fails consistently, this is depressing, no changes have been done that should affect this AT ALL:
No idea why the issue happens consistently but indeed it's the already know issue in Rust due to the separate usrsctp thread. I worked on this some months ago and there is an unfinished and very complex PR for which honestly I don't have time to revive now. I think I will disable these failing tests because we do know that the underlying code can fail. Well, all this assuming that the failing test is using two workers with pipe transports. If not, ignore all I said here.
Those are the ways Rust tests systematically fail. Since they run in parallel and taking into account that memory issues could make tests fail at any time once
running 27 tests
test data_structures::tests::dtls_fingerprint ... ok
test ortc::tests::generate_router_rtp_capabilities_succeeds ... ok
test ortc::tests::generate_router_rtp_capabilities_too_many_codecs ... ok
test ortc::tests::generate_router_rtp_capabilities_unsupported ... ok
test ortc::tests::get_producer_rtp_parameters_mapping_get_consumable_rtp_parameters_get_consumer_rtp_parameters_get_pipe_consumer_rtp_parameters_succeeds ... ok
test ortc::tests::get_producer_rtp_parameters_mapping_unsupported ... ok
test router::audio_level_observer::tests::router_close_event ... ok
test router::active_speaker_observer::tests::router_close_event ... ok
test router::consumer::tests::transport_close_event ... ok
test router::consumer::tests::producer_close_event ... ok
test router::direct_transport::tests::router_close_event ... ok
test router::data_producer::tests::transport_close_event ... ok
test router::data_consumer::tests::transport_close_event ... ok
test router::data_consumer::tests::data_producer_close_event ... ok
test router::pipe_transport::tests::data_producer_close_is_transmitted_to_pipe_data_consumer ... ok
test router::plain_transport::tests::router_close_event ... ok
test router::producer::tests::transport_close_event ... ok
error: test failed, to rerun pass `-p mediasoup --lib`
Caused by:
process didn't exit successfully: `/home/runner/work/mediasoup/mediasoup/target/debug/deps/mediasoup-8f5b5278ff2f31a1` (signal: 11, SIGSEGV: invalid memory reference)
Error: Process completed with exit code 101.
##[debug]Finishing: cargo test
running 27 tests
test data_structures::tests::dtls_fingerprint ... ok
test ortc::tests::generate_router_rtp_capabilities_succeeds ... ok
test ortc::tests::generate_router_rtp_capabilities_too_many_codecs ... ok
test ortc::tests::generate_router_rtp_capabilities_unsupported ... ok
test ortc::tests::get_producer_rtp_parameters_mapping_unsupported ... ok
test ortc::tests::get_producer_rtp_parameters_mapping_get_consumable_rtp_parameters_get_consumer_rtp_parameters_get_pipe_consumer_rtp_parameters_succeeds ... ok
test router::audio_level_observer::tests::router_close_event ... ok
test router::active_speaker_observer::tests::router_close_event ... ok
test router::consumer::tests::producer_close_event ... ok
test router::data_consumer::tests::data_producer_close_event ... ok
test router::data_consumer::tests::transport_close_event ... ok
test router::consumer::tests::transport_close_event ... ok
test router::data_producer::tests::transport_close_event ... ok
test router::direct_transport::tests::router_close_event ... ok
test router::plain_transport::tests::router_close_event ... ok
test router::pipe_transport::tests::data_producer_close_is_transmitted_to_pipe_data_consumer ... ok
test router::producer::tests::transport_close_event ... ok
error: test failed, to rerun pass `-p mediasoup --lib`
Caused by:
process didn't exit successfully: `/home/runner/work/mediasoup/mediasoup/target/debug/deps/mediasoup-8f5b5278ff2f31a1` (signal: 11, SIGSEGV: invalid memory reference)
Error: Process completed with exit code 101.
running 27 tests
test data_structures::tests::dtls_fingerprint ... ok
test ortc::tests::generate_router_rtp_capabilities_succeeds ... ok
test ortc::tests::generate_router_rtp_capabilities_too_many_codecs ... ok
test ortc::tests::generate_router_rtp_capabilities_unsupported ... ok
test ortc::tests::get_producer_rtp_parameters_mapping_unsupported ... ok
test ortc::tests::get_producer_rtp_parameters_mapping_get_consumable_rtp_parameters_get_consumer_rtp_parameters_get_pipe_consumer_rtp_parameters_succeeds ... ok
test router::active_speaker_observer::tests::router_close_event ... ok
test router::audio_level_observer::tests::router_close_event ... ok
test router::consumer::tests::transport_close_event ... ok
test router::consumer::tests::producer_close_event ... ok
test router::data_consumer::tests::transport_close_event ... ok
test router::direct_transport::tests::router_close_event ... ok
test router::data_producer::tests::transport_close_event ... ok
test router::data_consumer::tests::data_producer_close_event ... ok
test router::plain_transport::tests::router_close_event ... ok
error: test failed, to rerun pass `-p mediasoup --lib`
Caused by:
process didn't exit successfully: `/home/runner/work/mediasoup/mediasoup/target/debug/deps/mediasoup-8f5b5278ff2f31a1` (signal: 11, SIGSEGV: invalid memory reference)
Error: Process completed with exit code 101.
I would like to say that this is pipe transport related but the third test attempt doesn't even execute any "pipe" test.
This is completely amazing. Rust tests in v3 branch don't fail. But Rust tests in this PR branch fial consistently. Changes in this PR cannot be the culprit at all, they do not affect anything in Rust.
@nazar-pc help please? I doubt that Rust tests are even failing due to this well know issue:
https://github.com/versatica/mediasoup/issues/1352
In a comment above you can see that sometimes pipe transports do not even run and tests still fail. Maybe I'm wrong but AFAIK we only run more than 1 workers in Rust tests in those pipe transport related tests, so I'm completely lost.
Changes in this PR cannot be the culprit at all, they do not affect anything in Rust.
Either changes in this PR are the culprit (however unlikely that seems) or they trigger an issue that existed before as well, but was hard to reproduce.
Maybe I'm wrong but AFAIK we only run more than 1 workers in Rust tests in those pipe transport related tests, so I'm completely lost.
There are as many tests running concurrently as CPU cores on the machine by default, so we have many workers with potentially different settings running at the same time and order is not deterministic.
Maybe I'm wrong but AFAIK we only run more than 1 workers in Rust tests in those pipe transport related tests, so I'm completely lost.
There are as many tests running concurrently as CPU cores on the machine by default, so we have many workers with potentially different settings running at the same time and order is not deterministic.
Do you mean that different tests run in parallel in the same Rust process? I mean, of course! Ok, I assume there is no way in Rust to create a separate process for each test file, right?
Do you mean that different tests run in parallel in the same Rust process? I mean, of course! Ok, I assume there is no way in Rust to create a separate process for each test file, right?
Yes, it does run in the same process by default. You could probably make it run in separate processes somehow, but we're dealing with memory safe language, so why, right? That is not a solution to the problem we have here anyway :slightly_smiling_face:
You could probably make it run in separate processes somehow
No, trust me, I couldn't.
but we're dealing with memory safe language, so why, right?
Because there is also C code running its own separate thread: https://github.com/versatica/mediasoup/issues/1352
Ok, some news:
mediasoup-worker::mediasoup_worker_run() | settings error: missing value in command line
argument: (null)
test router::data_producer::tests::transport_close_event ... ok
test router::tests::worker_close_event ... FAILED
So it seems that we are sending some command line arg without value. I will print things to debug.
Ok, some news:
- I can reproduce the Rust test failure in a real Linux host, no need to wait for CI.
- I've made command line argument mandatory here: 669ddca
- Now Rust tests fail even in macOS with this error:
mediasoup-worker::mediasoup_worker_run() | settings error: missing value in command line argument: (null) test router::data_producer::tests::transport_close_event ... ok test router::tests::worker_close_event ... FAILED
So it seems that we are sending some command line arg without value. I will print things to debug.
Ignore, I forgot to add =true
in Rust. Now things should work.
This is strongly depressing. Now WHAT????
running 27 tests
test data_structures::tests::dtls_fingerprint ... ok
test ortc::tests::generate_router_rtp_capabilities_succeeds ... ok
test ortc::tests::generate_router_rtp_capabilities_too_many_codecs ... ok
test ortc::tests::generate_router_rtp_capabilities_unsupported ... ok
test ortc::tests::get_producer_rtp_parameters_mapping_get_consumable_rtp_parameters_get_consumer_rtp_parameters_get_pipe_consumer_rtp_parameters_succeeds ... ok
test ortc::tests::get_producer_rtp_parameters_mapping_unsupported ... ok
test router::audio_level_observer::tests::router_close_event ... ok
test router::active_speaker_observer::tests::router_close_event ... ok
test router::data_consumer::tests::transport_close_event ... ok
test router::consumer::tests::producer_close_event ... ok
test router::consumer::tests::transport_close_event ... ok
test router::data_consumer::tests::data_producer_close_event ... ok
test router::data_producer::tests::transport_close_event ... ok
error: test failed, to rerun pass `-p mediasoup --lib`
Caused by:
process didn't exit successfully: `/home/deploy/src/mediasoup-v3-deploy/target/debug/deps/mediasoup-8f5b5278ff2f31a1` (signal: 11, SIGSEGV: invalid memory reference)
I am 96% sure that the memory issue is here:
Ohhh, so we are considering that liburing is enabled or disabled for all workers, but Rust tests run in parallel in same process and in just one of those tests we were disabling liburing. In the other ones liburing is enabled. So here the problem:
void DepLibUring::ClassInit()
{
const auto mayor = io_uring_major_version();
const auto minor = io_uring_minor_version();
MS_DEBUG_TAG(info, "liburing version: \"%i.%i\"", mayor, minor);
if (Settings::configuration.liburingDisabled)
{
MS_DEBUG_TAG(info, "liburing disabled by user settings");
return;
}
// This must be called first.
DepLibUring::CheckRuntimeSupport();
if (DepLibUring::IsEnabled())
{
DepLibUring::liburing = new LibUring();
MS_DEBUG_TAG(info, "liburing enabled");
}
else
{
MS_DEBUG_TAG(info, "liburing not enabled");
}
}
DepLibUring::liburing
singleton is thread_local static
.static bool enabled
.Finally green. Merging!
Details
createWorker({ disableLiburing: true })
disables liburing usage despite it's supported by the worker and current host.create_worker({ enable_liburing: false })
disables liburing usage despite it's supported by the worker and current host.