Open perlindgren opened 2 years ago
Status update:
There is now a corresponding POC:
https://github.com/rtic-rs/cortex-m-rtic/tree/lockall
There is still an outstanding issue regarding the resource proxy for the lock-all API. It currently exposes the priority (Priority type). At some point, we deemed that to be risky, I don't recall now exactly how it can potentially be exploited. In any case that problem should be solvable if we decide to include the lockall API in the RTIC 0.6 release. Please go ahead an try it. The examples/lockall*.rs should give you an idea how it can be used.
Status update:
The Priority leakage has been fixed in:
Status update:
Added a comparison examples/multilock_vs_lockall
. The correct field types are inferred by current rust-analyzer
.
Status update:
The current multi-lock moves the resource proxies into the lock
. While this is sound it blocks further locks, so even outside/after the lock
closure there is no way to recover the proxies. See e.g., examples/multilock_problem.rs
.
Status update:
A workaround to the above mentioned problem is to pass mutable references to proxies instead of moving the proxies.
See, e.g. examples/multilock_problem_workaround.rs
.
@perlindgren the internal use of &'static mut _
references strikes me as 'Undefined-Behavior wrong' / 'semantically wrong'. Not that I can show you how to make it explode today but I would put it under the category 'some future change in rustc/LLVM codegen may make this Undefined Behavior in the future'
I would suggest creating / using a separate trait to achieve the right lifetime constraints:
NOTE: this is a sketch & syntax/API from memory; I didn't try to compile it so it may be wrong in a few places
/* input RTIC code */
struct Shared {
x: i32,
y: i64,
z: bool,
}
#[task(shared = [x, y])]
fn f(cx: f::Context) {
// (the move is NOT need; this is just to show the type name)
let mut shared: f::SharedResources = cx.shared;
cx.shared.lock(|shared: f::SharedMut| { // <- SharedMutex trait
*shared.x *= 1;
*shared.y *= 1;
});
// resource proxies can still be moved out and used independently
// (again, the move is just to show the type name)
let mut x: resources::X = shared.x;
x.lock(|x| { // <- Mutex trait
*x += 1;
});
}
/* generated code */
// this is the existing codegen (simplified for brevity)
mod resources {
// proxies
struct X;
struct Y;
impl Mutex for X { /* .. */ }
impl Mutex for Y { /* .. */ }
}
mod f {
struct SharedResources {
x: crate::resources::X,
y: crate::resources::Y,
}
// this is the new codegen
// NO `'static` lifetimes
struct SharedMut<'a> {
x: &'a mut i32,
y: &'a mut i64,
}
impl<'a> SharedMutex for &'a mut SharedResources {
// ^^ != 'static ^^^^^^^ reference, not plain type
//
type T = SharedMut<'a>;
// ^^ connect lifetime to closure argument
fn lock<R>(self, f: impl FnOnce(SharedMut<'a>) -> R) -> R {
// ^^^^ Self is &'a mut SharedResources
// do locking here
}
}
}
// this is the new trait
trait SharedMutex {
type T;
fn lock<R>(self, f: impl FnOnce(Self::T) -> R) -> R;
// ^^^^ NOTE by value, not by reference (`&mut self`)
}
@japaric, I tried something similar before residing to &'static mut
in the current API, but got into a problem with lifetimes. I did not use a separate trait with move semantics though. That would also make it a bit asymmetric to the current lock. (Where you would not be able to re-lock resources unless you move the struct back, and I'm not sure how that would work, as we are not giving the proxy as a parameter to the closure (just the locked struct).
I'm not 100% sure what in Rust's semantics that actually saves us of from unsoundness in the current proposal, but it seems that when lending out an &'a mut Shared { a: &static mut T }
the inner &'static mut
will be demoted to `a. If this is defined somewhere, the current proposal should be fine (and it is the behavior of the current compiler).
https://doc.rust-lang.org/reference/subtyping.html
Is not conclusive to this point, (under what lifetime the &`static field will be accessible, but the only thing that makes sense is to assume the stricter lifetime).
By the way, I verified that at least for trivial cases the abstraction is zero cost and the lock would be optimized out. We rely on Rust+LLVM to evaluate the priority at compile time. In the general case this might become too complex for Rust+LLVM to figure out, and the lock would not be optimized out, but I haven't seen that in practice in my small experiments.
After some additional testing I haven't found a way to break the proposed API.
In examples/lockall_soundness3.rs
an attempt to move out the locked struct is rejected (due to lifetime error). Essentially, together with the prior soundness tests (1/2) this should cover the possibilities to break the API.
Regarding performance, to more examples were added (examples/lock_cost.rs
and examples/lockall_cost.rs
). The features:
[features]
inline-asm = ["cortex-m/inline-asm"]
linker-plugin-lto = ["cortex-m/linker-plugin-lto"]
were added to Cargo.toml
(not sure we want them there finally, maybe there is some possibility to have these features available only for building examples...?).
I added two examples lock_cost
and lockall_cost
for comparison. Code for further analysis are generated by:
cargo objdump --example lock_cost --target thumbv7m-none-eabi --release --features inline-asm -- --disassemble > lock_cost.objdump
When comparing the lock
and lockall
they generate exactly the same code, so performance should be Ok (zero cost). In both cases a taken lock will generate:
...
#[task(binds = GPIOA, shared = [shared])]
fn low(mut cx: low::Context) {
cx.shared.shared.lock(|shared| *shared += 1);
}
#[task(binds = GPIOB, priority = 2, shared = [shared])]
fn high(mut cx: high::Context) {
cx.shared.shared.lock(|shared| *shared += 1);
}
// 0000016c <GPIOA>:
// 16c: 80 b5 push {r7, lr}
// 16e: 6f 46 mov r7, sp
// 170: c0 20 movs r0, #192
// 172: 80 f3 11 88 msr basepri, r0
// 176: 40 f2 00 00 movw r0, #0
// 17a: c2 f2 00 00 movt r0, #8192
// 17e: 01 68 ldr r1, [r0]
// 180: 01 31 adds r1, #1
// 182: 01 60 str r1, [r0]
// 184: e0 20 movs r0, #224
// 186: 80 f3 11 88 msr basepri, r0
// 18a: 00 20 movs r0, #0
// 18c: 80 f3 11 88 msr basepri, r0
// 190: 80 bd pop {r7, pc}
In the above case the lock is taken (low prio).
// 00000192 <GPIOB>:
// 192: 80 b5 push {r7, lr}
// 194: 6f 46 mov r7, sp
// 196: 40 f2 00 01 movw r1, #0
// 19a: ef f3 11 80 mrs r0, basepri
// 19e: c2 f2 00 01 movt r1, #8192
// 1a2: 0a 68 ldr r2, [r1]
// 1a4: 01 32 adds r2, #1
// 1a6: 0a 60 str r2, [r1]
// 1a8: 80 f3 11 88 msr basepri, r0
// 1ac: 80 bd pop {r7, pc}
In the above the lock is optimized out.
The very same code is generated for lock
and lock_all
. In both cases the code could be further optimized according to 19, but the improvement is minor (we are already "zero cost" in the big picture).
In any case the proposed API does not introduce any additional OH, at least not for simple examples.
(revisiting this thread after some vacation away from the computer)
I'm not 100% sure what in Rust's semantics that actually saves us of from unsoundness in the current proposal, but it seems that when lending out an &'a mut Shared { a: &static mut T } the inner &'static mut will be demoted to `a
I'd be more concerned about (LLVM) codegen than the borrow checker or other type check not working adequately. A future version of rustc could produce LLVM IR in such a way the creation of a static reference (&'static mut T
) in more than one execution path during the lifetime of the program is Undefined Behavior. Limiting the creation of &'static mut T
to at most once in the lifetime of the program strikes me as valid (a valid optimization). Your use of static lifetimes creates a &'static mut T
on each lock, which could happen more than once in the lifetime of the program.
Yes, you create a static reference and then reborrow it with a non-static lifetime before you hand it to the user but the intermediate step is hypothetically / potentially problematic. This is a bit like how you are not allowed to create a reference to an uninitialized struct field and then create a raw pointer from it &mut x as *mut T
because that's Undefined Behavior; instead you should use ptr::addr_of_mut!
to directly create a raw pointer (*mut T
) without the intermediate reference (&mut T
).
In any case, this is all speculation because I don't think creation of static references (&'static mut
) is well specified in the language or covered by the unsafe code guidelines at the moment.
I did not use a separate trait with move semantics though.
Note that the trait uses self
but does not consume the caller because it's implemented on a mutable reference so the usual re-borrowing rules apply (in non-generic context) when calling the trait method. If you want to use a method with a &mut self
receiver instead of a self
receiver you would need to use the GAT (Generic Associated Types) feature (which has not been implemented yet) to adequately constrain the lifetimes without using the intermediate 'static
lifetime.
Thanks for your input japaric. I made some further experimentation, the current version does not use &'static mut
, but rather &'a mut
, however it may still suffer from unsoundness due to the points you mentioned.
The whole idea creating a structure reflecting the accessible shared resources as fields is not the only way though.
Maybe it would be better to create a tuple (like done now with the macro). The advantage over the current macro based multi-lock, would be ergonomics (we can lock all without splitting), and secondly performance (guaranteed to have a single critical section). This would also allow us to use the current way of dealing out the resources to the closure (so symmetric). Should not be too complex to implement either.
Thinking twice on the problem, tuples are indeed treated as structs in Rust, so I'm not sure it will make any difference to the soundness. However, the idea is not "bad", instead of giving an array of shared resources (for a task) the syntax could be changed to a tuple (and the order of elements would be the same for the declaration and the lockall api). This way the struct name would not need to be defined, so a bit less "magic" overall.
Problem
The current
lock
API allows locking only a single resource. Thanks to theMutex
design, it is possible to lock multiple resources, but requires some "boilerplate" code. E.g.,RTIC ensures at compile time (and at zero-cost) that each resource can only be locked once. (Allowing a resource to be locked numerous times would break with the soundness invariant for mutable aliasing.) This is done by moving in the resource proxy in the lock (the lock consume the proxy). While elegant, this requires the resource proxies to be split before the lock.
The multi-lock will conceptually lead to a nested critical sections: something like (s1.lock(s2.lock(s3.lock(...)...)...). In case resources have different ceilings, the rendered code may (dependent on lock order) perform numerous system ceiling updates.
Currently multi lock is limited to a fixed set of nesting levels. While this is typically not an issue in practice, it feels from a design point of view a bit unsatisfying.
Suggested alternative approach
The complete set of shared resources for the task can be locked at once, e.g.:
and, it should also allow
Soundness
The
lock
will consume the entire lock proxy structure, so the "locked once" invariant is preserved, and thus soundness can be maintained.Performance
This approach will lead to a single critical section (and thus reduce the number of system ceiling updates compared to the current-multi lock).
Implementation
While seemingly straightforward, the devil is always in the details. Under the hood a resource proxy for the complete set of shared resources needs to be generated for each context (task). Additional ceiling analysis is required. In release mode the compiler should be able to optimize out overhead leading to a zero-cost abstraction (but it needs to be implemented and verified).
Design decisions include: where the additional types should be created. What name should be adopted for the passed structure (in the second example above
Shared
was suggested).Pros and cons
Foreseen advantages: Better ergonomics. Guaranteed performance (as good or better than current multi-lock). The artificial limitation of number of lockable resources is limited (though this is in practice likely not a problem).
Foreseen disadvantages: Added complexity to RTIC code generation. New API may lead to users to unnecessary locking (as a lock-all is so easy and "inviting"). Lock-all will lock immediately at the highest ceiling, thus in cases restricting concurrency.
Conclusion
An easy to use and understand API extension. As long as the programmer understands the implication the programmer can take a deliberate design decisions, taking impact to concurrency in consideration.
Status
POC in branch
lockall