rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do
https://rust-lang.github.io/unsafe-code-guidelines
Apache License 2.0
649 stars 57 forks source link

Safety/soundness of macros #278

Open RalfJung opened 3 years ago

RalfJung commented 3 years ago

It is not entirely clear what it means for a macro to be "safe", and by extension, what it means for a macro to be "sound". This is what underlies the old discussion at https://github.com/RustSec/advisory-db/issues/275, and also has some up in other discussions:

We should figure out some proper definition for when a macro is considered to be "sound".

danielhenrymantilla commented 3 years ago

The obvious definition would be similar to fns and traits, etc.: if a user of the macro (let's start with a downstream one) can cause UB by using that macro and some extra code without having to use unsafe blocks or to implement an unsafe trait, then the macro / the API can be considered unsound.

That being said, the reality of macros is way more brittle than what this definition expects. Indeed, until we get more advanced hygiene (privacy-wise, I mean), global paths are resolved from the call-site environment, rather than the macro environment.

On top of that, procedural macros don't have the luxury of having a $crate (technically, since Span::mixed_site(), they can expand to a $crate to refer to their own crate, but such a crate is a proc-macro one, and can thus only export other proc-macros); ultimately, it is impossible for an attribute-macro or a derive macro to refer to a non-proc-macro item (such as a type, or a function; be it from ::std or from their frontend crate) in a fully namespace-agnostic way.

From there, one could relax a bit the rules of soundness to take this reality into account, with the following "amendment":

Suggested rule

A macro is considered unsound if a user can cause UB using it, provided they didn't:


From there, here comes the "sound macro checklist":


Quid of #[doc(hidden)] pub macros?

Note I am not talking of private macros since that's a less clear-cut area; and there is also the question of "private but technically pub" internal helper macros. Imho, since those are technically pub, they should follow the rules of pub macros and require they be give unsafe code context or an unsafe keyword when used (which the actual public macro can do if it needs to).


Quid of include!-like macros?

Macros such as include!, or proc-macro equivalents (including the caesar! example) are just "forwarding code" from some place, and any resulting unsoundness stems from the unsoundness of that original code. With my simplistic definitions, this would label these macros as "unsound".

I think these are a special category of macros, and so the above rule cannot be applied directly; instead, common judgement should estimate how hard it is to audit the "input code" that the macro is forwarding. Taking include!, for instance, if one can find the actual source file pulled by include!, then one can audit that other file. If it is procedurally generated (e.g., by a build.rs script), then one needs to audit the code there, etc. The harder it becomes to audit the input source, the less auditable that macro call becomes. At an extreme point, having: decipher! { #![decipher::i_trust_this_input(unsafe)] <actual encoded input> } could be deemed necessary, so as to express that auditing beyond that very call site is necessary.

The fact that defining strict and clear-cut soundness boundaries for macros is that hard should not come as a surprise: one can define soundness from within a set of rules, such as the rules of a language, as is the case with Rust and unsafe. But there are things beyond a language that play with the "safety" of an execution, such as a filesystem or all the meta-build shenanigans such as linking and symbol shadowing. Some of these are dangerous but not memory-unsafe stricto sensu, but others, such as using mmap backed by a file that is mutated in parallel, are just beyond Rust's power to control (memory) safety. In such instances, the convention is to mark those as unsafe(-to-call), and document the "safety preconditions" required to call it.

In that regard, requiring that caesar!, include!("procedurally_generated_file.rs"), be called from within an unsafe block or be given an unsafe keyword "token" for it could make sense; although in the case of include!, this hasn't been historically the case. Again, I think in these cases we cannot really have rules, but just some sort of jurisprudence 🤷

RalfJung commented 3 years ago

The obvious definition would be similar to fns and traits, etc.: if a user of the macro (let's start with a downstream one) can cause UB by using that macro and some extra code without having to use unsafe blocks or to implement an unsafe trait, then the macro / the API can be considered unsound.

This is the definition for safe fns. So to use it, we first have to define when a macro is considered "safe". For fn this is unambiguous; for macros much less so.

But once we have such a definition, then I agree that this is the obvious definition for soundness. Thanks for assembling the list of consequences of that choice, this is very helpful!