Experiment: run Crater with strict provenance lints

seharris commented 1 year ago

I'd like to do a Crater run with the strict provenance fuzzy_provenance_casts lint enabled. This follows from a conversation on Zulip (https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Pre-RFC.20discussion.3A.20usize.20semantics/near/398681981). Hopefully doing this will give us some more robust data about the impact of provenance on the Rust ecosystem to inform ongoing conversations about usize and support for CHERI. I've chosen this lint (of the two added by strict provenance) because it should pick up cases that will crash on CHERI. The strict provenance lints are listed here: https://doc.rust-lang.org/nightly/unstable-book/language-features/strict-provenance.html

I think the easiest way to do this is to set RUSTFLAGS='-Zcrate-attr=feature(strict_provenance) -Dfuzzy_provenance_casts'. This should detect provenance issues in crates and their dependencies, although using -D means only the first case will be detected. It looks like Crater only reports errors, otherwise I would have suggested -W, which would detect more issues that just the first one the compiler finds in that crate. I haven't found any information about whether Crater Bot supports any escaping, so I'm not sure how to include the spaces. Otherwise, I think the command needed will be something like run start=nightly end=nightly+rustflags=<something> mode=build-only crates=full.

(Paging @saethlin who offered to queue the Crater run, as it sounds like I won't have permission to use Crater Bot directly)

saethlin commented 1 year ago

Will a check run suffice? You seem to have read over the docs, so I'm not sure why you asked for a build run. (there are other jobs already in the queue so I'm not about to yolo it)

It looks like Crater only reports errors

Crater uploads all the output from running the compiler (it does truncate in some cases) so you can always do your own log analysis. I always do, some crates just don't build at all in crater because they require unpackaged files or extra state, but will fail differently in experiments like this and that is of interest. Even though to crater, the crate didn't build before and it didn't build after so it's not a regression.

seharris commented 1 year ago

Will a check run suffice?

Oh, yup, should do. Not sure why I discounted doing a check run when I was thinking about this, it makes a lot more sense.

Crater uploads all the output from running the compiler

That's good to know, I'd just assumed the downloads would be the same information as the full report it generates. In that case, I think it makes more sense to run with -W instead of -D so we get a more complete picture of what fails. I'll just have to think a little bit more about how to process the results into a meaningful conclusion.

saethlin commented 1 year ago

@craterbot run mode=check-only start=nightly end=nightly+rustflags=-Zcrate-attr=feature(strict_provenance)\ -Wfuzzy_provenance_casts crates=full

craterbot commented 1 year ago

:ok_hand: Experiment pr-117752 created and queued. :mag: You can check out the queue and this experiment's details.

:information_source: Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

craterbot commented 1 year ago

:construction: Experiment pr-117752 is now running

:information_source: Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

seharris commented 1 year ago

@saethlin Thanks for queuing that for me!

craterbot commented 1 year ago

:tada: Experiment pr-117752 is completed! :bar_chart: 685 regressed and 5 fixed (387252 total) :newspaper: Open the full report.

:warning: If you notice any spurious failure please add them to the blacklist! :information_source: Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

saethlin commented 1 year ago

Looks like the regressions are from crates that have already enabled the strict_provenance feature. Ouch. Best to just ignore the regressions then, they're not meaningful in any direction.

seharris commented 1 year ago

Yeah, though it does tell us something about how many crates are explicitly aware of strict provenance, which could be interesting.

I'm looking at how best to get a count of crates that tripped the lint at the moment (but excluding warnings from dependencies), plus hopefully some idea of the sort of use cases that cause problems. I haven't figured out how best to approach it yet, so it'll probably be at least a few days before I come up with anything worth sharing.

seharris commented 11 months ago

Ok, I've gone through the logs and put together some numbers.

The story so far, for anyone coming from Zulip and so on: I asked for a Crater run some weeks ago. The run used the nightly compiler at that time with the strict provenance fuzzy_provenance_casts lint configured to produce warnings. This is a lint meant to detect casts from integer to pointer that are incompatible with the strict provenance model, which also means they're incompatible with CHERI. The results seem to be like this...

Crater tried to build 387,225 crates.

0.49% of the crates perform casts likely to be incompatible with the strict provenance model (and CHERI). Another 0.36% of crates emitted errors about the lint, so they may contain provenance errors that won't have been detected (making for a worst case of 0.85% of crates breaking provenance). Errors about the lint were: error[E0636]: the feature 'strict_provenance' has already been declared, warning: unknown lint: 'fuzzy_provenance_casts', and error: unstable feature. The first is interesting, because this gives us a count of crates already using strict provenance: there were 38 (<0.01%). 37.58% of crates failed to compile but also did not produce any provenance warnings. Compilation failure could mean that some of these crates have provenance errors but did not reach the stage where the lint would run. This summary assumes that crates which are already broken can be ignored.

Two classes of provenance warning are excluded from the count of problem crates: NULL casts and signaling/MMIO.

NULL casts mean code which casts zero to a pointer type to a create a deliberately invalid pointer. This appears in 0.13% of crates, and is ignored because zero is commonly used to signal an invalid pointer that should never be dereferenced. Using std::ptr::null() might be tidier, but otherwise this seems a reasonable thing to do, and this should behave as intended on CHERI.

Signaling/MMIO means code which casts a numeric literal (that is not zero) to a pointer type. This appears in 0.18% of crates. Random sampling of logs showed two cases where this is done: using a numeric value to give status information (signaling), and accessing memory-mapped devices (MMIO). In signaling, the pointer will never be valid, and like NULL should never be dereferenced. In MMIO, hardware registers appear as memory at a known address, and are read/written to interact with hardware. MMIO is highly platform specific, so won't work on a CHERI target unless it has been specifically designed to, so either it will not work regardless of provenance, or the author already knows they have to allow for provenance.

To get an idea of what provenance issues look like, here is a random sample of issues that caused 20 warnings:

describing data layouts to OpenGL (very similar to signaling and not likely to be a problem), appeared 3 times
FFI for another language's VM
reading executable layout and computing addresses for direct access, appears 2 times
web assembly internals
Windows-specific custom dynamic library loading
JIT compiling to x86 instructions for a virtual machine
strange redundant round trip cast?
custom allocators, appear 3 times
implementing breakpoints in a debugger
setting up heap in a custom operating system
signaling not using numeric literals directly (hard to detect automatically)
MMIO not using numeric literals directly (hard to detect automatically), appears 3 times
NULL cast using an alias for a pointer type (hard to detect automatically)

The statistics here come from running a Python script over the logs from the Crater run to look for patterns in output from the compiler. If anyone wants to see what the script could have missed, the code is here: https://gist.github.com/seharris/fb7606e0dbddcabbc9702c644372a95b

There are some problems with this approach to gathering statistics that are worth bearing in mind: As can be seen above, processing logs by searching for patterns is limited. Some cases cannot be reliably detected without access to additional information, like variable and type names. Using the fuzzy_provenance_casts lint to find problems is also limited. It cannot detect cases where address calculations rely on size_of::<usize>() == size_of::<*const T>(), and (to my knowledge) ignores casts made with std::mem::transmute().

Hopefully putting some more concrete numbers to how many crates use integer types to store pointer values (this experiment detects the conversion back) will help move discussions about usize, CHERI, and strict provenance forward a little. I guess I'll close this issue as I can't see any further action needed.

rust-lang / rust

Experiment: run Crater with strict provenance lints #117752