Allow suppressing unsupported FFI calls

landaire commented 3 years ago

In the project I'm working on we have many tests that we'd potentially like to run with Miri in our CI/CD. These tests may invoke syscalls or other unsupported FFI operations that we would like to suppress and treat as successful tests for a few reasons:

Reduce the developer burden of applying cfg(miri) to ignore specific tests.
Tests invoking unsupported foreign functions will still be emulated up until they perform the foreign function call. Testing to this point may still yield valuable results.
Opting out of specific tests requires conscious thought of whether a test should be re-enabled after a refactoring to remove a foreign function call that would otherwise raise an error.

I believe this could be handled with the introduction of a new -Z variable, -Zmiri-ignore-ffi-failure, that would simply treat these failures as an early exit with status code 0. Here's the proposed change to Miri:

Click to view diff

```patch diff --git a/src/bin/miri.rs b/src/bin/miri.rs index 4a1ea3a5..78f9b196 100644 --- a/src/bin/miri.rs +++ b/src/bin/miri.rs @@ -230,6 +230,9 @@ fn main() { "-Zmiri-disable-isolation" => { miri_config.communicate = true; } + "-Zmiri-ignore-ffi-failures" => { + miri_config.ignore_unsupported_ffi_failures = true; + } "-Zmiri-ignore-leaks" => { miri_config.ignore_leaks = true; } diff --git a/src/eval.rs b/src/eval.rs index 1e46015f..8acc3020 100644 --- a/src/eval.rs +++ b/src/eval.rs @@ -35,6 +35,8 @@ pub struct MiriConfig { pub communicate: bool, /// Determines if memory leaks should be ignored. pub ignore_leaks: bool, + /// Determines if failures caused by unsupported FFI failures should be ignored. + pub ignore_unsupported_ffi_failures: bool, /// Environment variables that should always be isolated from the host. pub excluded_env_vars: Vec, /// Command-line arguments passed to the interpreted program. @@ -206,6 +208,7 @@ pub fn create_ecx<'mir, 'tcx: 'mir>( pub fn eval_main<'tcx>(tcx: TyCtxt<'tcx>, main_id: DefId, config: MiriConfig) -> Option { // Copy setting before we move `config`. let ignore_leaks = config.ignore_leaks; + let ignore_unsupported_ffi_failures= config.ignore_unsupported_ffi_failures; let (mut ecx, ret_place) = match create_ecx(tcx, main_id, config) { Ok(v) => v, @@ -267,6 +270,14 @@ pub fn eval_main<'tcx>(tcx: TyCtxt<'tcx>, main_id: DefId, config: MiriConfig) -> } Some(return_code) } - Err(e) => report_error(&ecx, e), + Err(e) => { + if let InterpError::Unsupported(UnsupportedOpInfo::UnsupportedForeignFunction(_name)) = e.kind() { + if ignore_unsupported_ffi_failures { + // clean exit + return Some(0); + } + } + report_error(&ecx, e) + } } } diff --git a/src/shims/posix/linux/foreign_items.rs b/src/shims/posix/linux/foreign_items.rs index f7d7706e..d78e6da3 100644 --- a/src/shims/posix/linux/foreign_items.rs +++ b/src/shims/posix/linux/foreign_items.rs @@ -214,7 +214,7 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriEvalContextExt<'mir, 'tcx this.write_null(dest)?; } - _ => throw_unsup_format!("can't call foreign function: {}", link_name), + _ => throw_unsup_foreign_function(link_name), }; Ok(true) diff --git a/src/shims/posix/macos/foreign_items.rs b/src/shims/posix/macos/foreign_items.rs index 9a7d3be1..03656395 100644 --- a/src/shims/posix/macos/foreign_items.rs +++ b/src/shims/posix/macos/foreign_items.rs @@ -156,7 +156,7 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriEvalContextExt<'mir, 'tcx this.write_scalar(addr, dest)?; } - _ => throw_unsup_format!("can't call foreign function: {}", link_name), + _ => throw_unsup_foreign_function(link_name), }; Ok(true) diff --git a/src/shims/windows/foreign_items.rs b/src/shims/windows/foreign_items.rs index b246ccc3..e4211669 100644 --- a/src/shims/windows/foreign_items.rs +++ b/src/shims/windows/foreign_items.rs @@ -415,7 +415,7 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriEvalContextExt<'mir, 'tcx this.write_scalar(Scalar::from_i32(1), dest)?; } - _ => throw_unsup_format!("can't call foreign function: {}", link_name), + _ => throw_unsup_foreign_function(link_name), } Ok(true) ```

This would require changes to rustc_middle as well for a new macro, throw_unsup_foreign_function, and a new UnsupportedOpInfo variant, UnsupportedOpInfo::UnsupportedForeignFunction(String).

landaire commented 3 years ago

If there is a better way of going about this please let me know, otherwise I would be happy to tackle this issue as proposed.

RalfJung commented 3 years ago

I believe this could be handled with the introduction of a new -Z variable, -Zmiri-ignore-ffi-failure, that would simply treat these failures as an early exit with status code 0.

That would exit the binary, and no more tests would be run... how is that helpful?

landaire commented 3 years ago

Is each test not a separate binary already? You can run cargo miri test --no-fail-fast today and it will run all tests but will still raise unsupported foreign function errors. The proposed behavior is helpful to prevent blocking a PR on features that are simply unsupported in Miri.

bjorn3 commented 3 years ago

No, all tests of a crate run within the same process on different threads. Rustc produces a single executable that is then run as a regular program by cargo. Only doctests are an exception in that they are each separately compiled and executed directly by rustdoc.

bjorn3 commented 3 years ago

--no-fail-fast causes cargo to run the test executable for each crate even if a single fails. If you get an unsupported foreign function error from miri, the rest of the tests in the current crate will not run, but the tests for other crates will still run with --no-fail-fast.

landaire commented 3 years ago

I see now. Sorry for the misunderstanding -- that definitely changes things. The general idea here is to just fail the individual test gracefully, but exiting early and hiding the fact other tests were not run is not desirable.

I'm clearly not super familiar with Miri/cargo-test internals, but help me understand a bit better: the changes I proposed above in eval::eval_main would return the status for the test executable and not the individual test, is that correct?

oli-obk commented 3 years ago

Maybe we can raise a panic instead of an unknown FFI error. Then the regular panic handler will just mark the test as failed instead of stopping the world

RalfJung commented 3 years ago

I'm clearly not super familiar with Miri/cargo-test internals, but help me understand a bit better: the changes I proposed above in eval::eval_main would return the status for the test executable and not the individual test, is that correct?

Yes. A test is just a single binary per crate where you don't write main, but instead a main is supplied by the testing harness and that iterates over all #[test] functions and calls them in a big loop. Miri then just interprets that binary as regular Rust code (i.e., Miri is not even aware of the concept of a "test").

Maybe we can raise a panic instead of an unknown FFI error. Then the regular panic handler will just mark the test as failed instead of stopping the world

Most FFI functions are nounwind, so I am not sure how much sense this makes and how well it'll work...

@landaire If you have so many tests that need FFI support, I am not sure how much Miri can really help. Maybe you can group the tests into different files? #![cfg(not(miri))] at the top of a file will disable all tests in that file for Miri.

landaire commented 3 years ago

@oli-obk that may work. The way we run tests in Azure Pipelines is to consume output from cargo test -- -Z unstable-options --format json. A test failing from a panic looks like this:

{ "type": "suite", "event": "started", "test_count": 1 }
{ "type": "test", "event": "started", "name": "tests::panic_test" }
{ "type": "test", "name": "tests::panic_test", "event": "failed", "stdout": "thread 'tests::panic_test' panicked at 'yo', src/main.rs:7:9\nnote: run with `RUST_BACKTRACE=1` environment variable to display a backtrace\n" }
{ "type": "suite", "event": "failed", "passed": 0, "failed": 1, "allowed_fail": 0, "ignored": 0, "measured": 0, "filtered_out": 0, "exec_time": 0.000117601 }

We could then enlighten the tool that parses test output to ignore panic-related failures as a success when passed a special flag or something.

@RalfJung

If you have so many tests that need FFI support, I am not sure how much Miri can really help.

Miri has already provided value in finding multiple instances of UB in our unsafe code.

Maybe you can group the tests into different files? #![cfg(not(miri))] at the top of a file will disable all tests in that file for Miri.

This is certainly possible but I'd like to go back to my original points:

Reduce the developer burden of applying cfg(miri) to ignore specific tests.

Tests invoking unsupported foreign functions will still be emulated up until they perform the foreign function call. Testing to this point may still yield valuable results.

Opting out of specific tests requires conscious thought of whether a test should be re-enabled after a refactoring to remove a foreign function call that would otherwise raise an error.

Additionally, it's a rather large project split across many crates with inter-dependencies. A massive refactoring for Miri is possible but is not an ideal solution IMO.

An alternative solution I hadn't thought of I guess is to run each test separately via parsing the output of cargo test -- --list and invoking cargo miri test $TEST_NAME, but I still have the issue of distinguishing a "legitimate violation" from unsupported features. String matching on errors can be done in our pipeline but is a bit of a hack. It's better than nothing though if this feature is not desired within Miri itself.

RalfJung commented 3 years ago

Miri has already provided value in finding multiple instances of UB in our unsafe code.

Interesting, so you have a mix of code that needs FFI and code that doesn't? This may be naive, but I figured it would usually be the case that this would be somewhat separated so that tests are, to some extend, already grouped by whether they need FFI or not. It seems like that is not the case for you. (Btw, we're always happy for submissions to our "trophy case" :D )

that may work. The way we run tests in Azure Pipelines is to consume output from cargo test -- -Z unstable-options --format json. A test failing from a panic looks like this:

So, this seems like an experiment worth doing. The feature would be off-by-default so its docs could mention caveats (e.g. there might be code relying on certain code not panicking, and things can go very wrong there when enabling this option).

Triggering a panic in Miri is pretty easy, you just call start_panic. You should be able to call that instead of doing throw_unsup when an unsupported FFI function is encountered. Can you run some of your tests with such a patched Miri to see if it helps? (Let me know if you need more help developing this patch.)

landaire commented 3 years ago

Interesting, so you have a mix of code that needs FFI and code that doesn't?

We're doing some work that at the core of the application relies on Windows syscalls or other FFI boundaries. So for simple data structures and pure rust functionality we can run them cleanly, but more complicated integration tests that need to cross the FFI boundary we of course fail when this happens. Unfortunately with abstraction it may not necessarily be obvious if the API needs to make an FFI call.

Even for FFI wrappers we may still have unsafe pure Rust setup that contains bugs, so exercising those tests as far as possible is still desired.

(Btw, we're always happy for submissions to our "trophy case" :D )

This is unfortunately the only public finding: https://github.com/icedland/iced/issues/168

I can outline a couple interesting patterns that I may write a future blog post about:

We had code similar to this, the intent was to copy an enum to a new location. The enum represents fixed-size (small) or dynamically-allocated data and the data_ptr() -> *const FirstFieldStructType implementation returned a pointer to the first field which was then being used to read the whole data structure (which no proper borrow existed for). Conversation in Zulip here
It's found multiple occasions where pointer aliasing rules were not being obeyed and a *const T was being converted to a *mut T. This appears to be mostly from an as_ptr(&self) -> *const T call being converted to *mut T when as_mut_ptr(&mut self) -> *mut T existed -- this is a good code smell and gave me a decent grep: self\..+ as \*mut. So far I've only seen one case where this pattern "valid" and the refactoring solution was to use an UnsafeCell<T>.
Right now I'm battling a case very similar to #1800 where a Box<T> gets aliased and stored somewhere. When the Box gets dropped, Miri raises the same trying to reborrow for SharedReadOnly violation from std::ptr::drop_in_place.

I understand it's still experimental but the stacked borrows work has been the majority of issues so far.

Systems engineering can be hard 😅.

So, this seems like an experiment worth doing.

I will give it a go and report back here/in Zulip. I appreciate the feedback from everyone so far.

landaire commented 3 years ago

Panicking at the point of an FFI call I think unfortunately introduces UB:

   --> C:\Users\lander\.rustup\toolchains\miri\lib\rustlib\src\rust\library\core\src\panicking.rs:42:6
    |
42  |     }
    |      ^ accessing a dead local variable
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information

    = note: inside `core::panicking::panic` at C:\Users\lander\.rustup\toolchains\miri\lib\rustlib\src\rust\library\core\src\panicking.rs:42:6

Here's my patch: https://gist.github.com/landaire/d1351ba8cf2f5dd23db0d0c7dc553f77

RalfJung commented 3 years ago

Panicking at the point of an FFI call I think unfortunately introduces UB:

Let's discuss on Zulip.

rust-lang / miri

Allow suppressing unsupported FFI calls #1807