Open RalfJung opened 3 years ago
The obvious definition would be similar to fn
s and traits, etc.: if a user of the macro (let's start with a downstream one) can cause UB by using that macro and some extra code without having to use unsafe
blocks or to implement an unsafe
trait, then the macro / the API can be considered unsound.
That being said, the reality of macros is way more brittle than what this definition expects. Indeed, until we get more advanced hygiene (privacy-wise, I mean), global paths are resolved from the call-site environment, rather than the macro environment.
macro_rules!
macros can circumvent this by always using $crate::
-prefixed fully-qualified paths, or inherent methods with no autoref-nor-autoderef indirection on known types. This, in turn, in order to work, requires that the author of the macro re-export from the root of their crate all the things that the macro may need to access:
#[doc(hidden)] /** Not part of the public API */ pub
mod __ {
pub use ::std;
// ...
}
#[macro_export]
macro_rules! m {( … ) => ({
let o: $crate::__::std::option::Option<_> = …;
if (&o).is_some() { // OK; careful with `o.is_some()`, it could be hijacked/shadowed by a trait in scope.
…
}
})}
Needless to say, rare are the macro authors that are that diligent.
On top of that, procedural macros don't have the luxury of having a $crate
(technically, since Span::mixed_site()
, they can expand to a $crate
to refer to their own crate, but such a crate is a proc-macro one, and can thus only export other proc-macros); ultimately, it is impossible for an attribute-macro or a derive macro to refer to a non-proc-macro item (such as a type, or a function; be it from ::std
or from their frontend crate) in a fully namespace-agnostic way.
From there, one could relax a bit the rules of soundness to take this reality into account, with the following "amendment":
A macro is considered unsound if a user can cause UB using it, provided they didn't:
use unsafe
code (unsafe
blocks or an unsafe
impl); or if they did use unsafe
, they did so following the # Safety
preconditions expressed in the macro or crate's documentation.
and did not rename any of the three stdlib crates (::core
, ::alloc
, ::std
), or the macro's frontend crate (Otherwise it is trivial to break the safety invariants of those libraries)
From there, here comes the "sound macro checklist":
unsafe
in its name.[ ] Otherwise make it impossible to cause UB with it with only non-unsafe
code (and not renaming yadda yadda).
Extra checks:
[ ] if the macro receives $arg:expr
inputs from the caller, it needs to make sure not to emit / refer to those from within an unsafe
block (one that would be justified for correct but unsafe
internal operations of the macro).
[ ] If the macro needs to expand to stdlib items, it needs to use a fully-qualified path with a leading disambiguating ::
(≥ 2018 edition).
Performing a direct assert!
to guard against follow-up unsafe
ops ought to be considered unsound, since the caller of macro may have their own assert!
macro in scope which does not necessarily diverge if the given condition is false
. Thus:
- assert!( … );
+ ::core::assert!( … );
unsafe { … }
Similarly:
- let n: usize = $index; // a caller could provide an `Evil` instance with a `impl PartialOrd<usize> for Evil`
+ let n: ::core::usize = $index;
::core::assert!(… > n);
unsafe { … }
[ ] A macro must also use fully qualified paths for trait methods: it should not assume the presence of traits in scope or lack thereof! It should also avoid bringing traits in scope itself when expanding caller-provided code:
- use ::std::io::Write; // Expansion of `$out` would be polluted with `Write` in scope, and potentially shadow method resolution!
- ::std::write!( $out, … )…
+ match $out { mut out => { use ::std::io::Write; ::std::write!(out, …)… }}
#[doc(hidden)] pub
macros?Note I am not talking of private macros since that's a less clear-cut area; and there is also the question of "private but technically pub
" internal helper macros. Imho, since those are technically pub
, they should follow the rules of pub
macros and require they be give unsafe
code context or an unsafe
keyword when used (which the actual public macro can do if it needs to).
include!
-like macros?Macros such as include!
, or proc-macro equivalents (including the caesar!
example) are just "forwarding code" from some place, and any resulting unsoundness stems from the unsoundness of that original code. With my simplistic definitions, this would label these macros as "unsound".
I think these are a special category of macros, and so the above rule cannot be applied directly; instead, common judgement should estimate how hard it is to audit the "input code" that the macro is forwarding. Taking include!
, for instance, if one can find the actual source file pulled by include!
, then one can audit that other file. If it is procedurally generated (e.g., by a build.rs
script), then one needs to audit the code there, etc. The harder it becomes to audit the input source, the less auditable that macro call becomes. At an extreme point, having: decipher! { #![decipher::i_trust_this_input(unsafe)] <actual encoded input> }
could be deemed necessary, so as to express that auditing beyond that very call site is necessary.
The fact that defining strict and clear-cut soundness boundaries for macros is that hard should not come as a surprise: one can define soundness from within a set of rules, such as the rules of a language, as is the case with Rust and unsafe
. But there are things beyond a language that play with the "safety" of an execution, such as a filesystem or all the meta-build shenanigans such as linking and symbol shadowing. Some of these are dangerous but not memory-unsafe
stricto sensu, but others, such as using mmap
backed by a file that is mutated in parallel, are just beyond Rust's power to control (memory) safety. In such instances, the convention is to mark those as unsafe
(-to-call), and document the "safety preconditions" required to call it.
In that regard, requiring that caesar!
, include!("procedurally_generated_file.rs")
, be called from within an unsafe
block or be given an unsafe
keyword "token" for it could make sense; although in the case of include!
, this hasn't been historically the case. Again, I think in these cases we cannot really have rules, but just some sort of jurisprudence 🤷
The obvious definition would be similar to fns and traits, etc.: if a user of the macro (let's start with a downstream one) can cause UB by using that macro and some extra code without having to use unsafe blocks or to implement an unsafe trait, then the macro / the API can be considered unsound.
This is the definition for safe fn
s. So to use it, we first have to define when a macro is considered "safe". For fn
this is unambiguous; for macros much less so.
But once we have such a definition, then I agree that this is the obvious definition for soundness. Thanks for assembling the list of consequences of that choice, this is very helpful!
It is not entirely clear what it means for a macro to be "safe", and by extension, what it means for a macro to be "sound". This is what underlies the old discussion at https://github.com/RustSec/advisory-db/issues/275, and also has some up in other discussions:
We should figure out some proper definition for when a macro is considered to be "sound".