perlindgren commented 2 years ago

Problem

The current lock API allows locking only a single resource. Thanks to the Mutex design, it is possible to lock multiple resources, but requires some "boilerplate" code. E.g.,

    #[task(shared = [shared1, shared2, shared3])]
    fn locks(c: locks::Context) {
        let s1 = c.shared.shared1;
        let s2 = c.shared.shared2;
        let s3 = c.shared.shared3;

        (s1, s2, s3).lock(|s1, s2, s3| {
            *s1 += 1;
            *s2 += 1;
            *s3 += 1;

            hprintln!("Multiple locks, s1: {}, s2: {}, s3: {}", *s1, *s2, *s3).unwrap();
        });

        debug::exit(debug::EXIT_SUCCESS); // Exit QEMU simulator
    }

RTIC ensures at compile time (and at zero-cost) that each resource can only be locked once. (Allowing a resource to be locked numerous times would break with the soundness invariant for mutable aliasing.) This is done by moving in the resource proxy in the lock (the lock consume the proxy). While elegant, this requires the resource proxies to be split before the lock.

The multi-lock will conceptually lead to a nested critical sections: something like (s1.lock(s2.lock(s3.lock(...)...)...). In case resources have different ceilings, the rendered code may (dependent on lock order) perform numerous system ceiling updates.

Currently multi lock is limited to a fixed set of nesting levels. While this is typically not an issue in practice, it feels from a design point of view a bit unsatisfying.

Suggested alternative approach

The complete set of shared resources for the task can be locked at once, e.g.:

    #[task(shared = [shared1, shared2, shared3])]
    fn locks(c: locks::Context) {
        c.locks(|s| {
            *s.shared1 += 1;
            *s.shared2 += 1;
            *s.shared3 += 1;

            hprintln!("Multiple locks, s1: {}, s2: {}, s3: {}", *s.shared1, *s.shared2, *s.shared3).unwrap();
        });

        debug::exit(debug::EXIT_SUCCESS); // Exit QEMU simulator
    }

and, it should also allow

    #[task(shared = [shared1, shared2, shared3])]
    fn locks(c: locks::Context) {
        c.locks(| Shared {shared1, shared2, shared3} | {
            **shared1 += 1;
            **shared2 += 1;
            **shared3 += 1;

            hprintln!("Multiple locks, s1: {}, s2: {}, s3: {}", *shared1, *shared2, *shared3).unwrap();
        });

        debug::exit(debug::EXIT_SUCCESS); // Exit QEMU simulator
    }

Soundness

The lock will consume the entire lock proxy structure, so the "locked once" invariant is preserved, and thus soundness can be maintained.

Performance

This approach will lead to a single critical section (and thus reduce the number of system ceiling updates compared to the current-multi lock).

Implementation

While seemingly straightforward, the devil is always in the details. Under the hood a resource proxy for the complete set of shared resources needs to be generated for each context (task). Additional ceiling analysis is required. In release mode the compiler should be able to optimize out overhead leading to a zero-cost abstraction (but it needs to be implemented and verified).

Design decisions include: where the additional types should be created. What name should be adopted for the passed structure (in the second example above Shared was suggested).

Pros and cons

Foreseen advantages: Better ergonomics. Guaranteed performance (as good or better than current multi-lock). The artificial limitation of number of lockable resources is limited (though this is in practice likely not a problem).

Foreseen disadvantages: Added complexity to RTIC code generation. New API may lead to users to unnecessary locking (as a lock-all is so easy and "inviting"). Lock-all will lock immediately at the highest ceiling, thus in cases restricting concurrency.

Conclusion

An easy to use and understand API extension. As long as the programmer understands the implication the programmer can take a deliberate design decisions, taking impact to concurrency in consideration.

Status

POC in branch lockall

perlindgren commented 2 years ago

Status update:

There is now a corresponding POC:

https://github.com/rtic-rs/cortex-m-rtic/tree/lockall

There is still an outstanding issue regarding the resource proxy for the lock-all API. It currently exposes the priority (Priority type). At some point, we deemed that to be risky, I don't recall now exactly how it can potentially be exploited. In any case that problem should be solvable if we decide to include the lockall API in the RTIC 0.6 release. Please go ahead an try it. The examples/lockall*.rs should give you an idea how it can be used.

perlindgren commented 2 years ago

Status update:

The Priority leakage has been fixed in:

https://github.com/rtic-rs/cortex-m-rtic/tree/lockall

perlindgren commented 2 years ago

Status update:

Added a comparison examples/multilock_vs_lockall. The correct field types are inferred by current rust-analyzer.

perlindgren commented 2 years ago

Status update:

The current multi-lock moves the resource proxies into the lock. While this is sound it blocks further locks, so even outside/after the lock closure there is no way to recover the proxies. See e.g., examples/multilock_problem.rs.

perlindgren commented 2 years ago

Status update:

A workaround to the above mentioned problem is to pass mutable references to proxies instead of moving the proxies. See, e.g. examples/multilock_problem_workaround.rs.

japaric commented 2 years ago

@perlindgren the internal use of &'static mut _ references strikes me as 'Undefined-Behavior wrong' / 'semantically wrong'. Not that I can show you how to make it explode today but I would put it under the category 'some future change in rustc/LLVM codegen may make this Undefined Behavior in the future'

I would suggest creating / using a separate trait to achieve the right lifetime constraints:

NOTE: this is a sketch & syntax/API from memory; I didn't try to compile it so it may be wrong in a few places

/* input RTIC code */
struct Shared {
    x: i32,
    y: i64,
    z: bool,
}

#[task(shared = [x, y])]
fn f(cx: f::Context) {
    // (the move is NOT need; this is just to show the type name)
    let mut shared: f::SharedResources = cx.shared;

    cx.shared.lock(|shared: f::SharedMut| { // <- SharedMutex trait
        *shared.x *= 1;
        *shared.y *= 1;
    });

    // resource proxies can still be moved out and used independently

    // (again, the move is just to show the type name)
    let mut x: resources::X = shared.x;
    x.lock(|x| { // <- Mutex trait
        *x += 1;
    });
}

/* generated code */
// this is the existing codegen (simplified for brevity)
mod resources {
    // proxies
    struct X;
    struct Y;

    impl Mutex for X { /* .. */ }
    impl Mutex for Y { /* .. */ }
}

mod f {
    struct SharedResources {
        x: crate::resources::X,
        y: crate::resources::Y,
    }

    // this is the new codegen
    // NO `'static` lifetimes
    struct SharedMut<'a> {
        x: &'a mut i32,
        y: &'a mut i64,
    }

    impl<'a> SharedMutex for &'a mut SharedResources {
    //   ^^ != 'static       ^^^^^^^ reference, not plain type
    //
        type T = SharedMut<'a>;
        //                 ^^ connect lifetime to closure argument

        fn lock<R>(self, f: impl FnOnce(SharedMut<'a>) -> R) -> R {
            //     ^^^^ Self is &'a mut SharedResources
            // do locking here
        }
    }
}

// this is the new trait
trait SharedMutex {
    type T;
    fn lock<R>(self, f: impl FnOnce(Self::T) -> R) -> R;
    //         ^^^^ NOTE by value, not by reference (`&mut self`)
}

perlindgren commented 2 years ago

@japaric, I tried something similar before residing to &'static mut in the current API, but got into a problem with lifetimes. I did not use a separate trait with move semantics though. That would also make it a bit asymmetric to the current lock. (Where you would not be able to re-lock resources unless you move the struct back, and I'm not sure how that would work, as we are not giving the proxy as a parameter to the closure (just the locked struct).

I'm not 100% sure what in Rust's semantics that actually saves us of from unsoundness in the current proposal, but it seems that when lending out an &'a mut Shared { a: &static mut T } the inner &'static mut will be demoted to `a. If this is defined somewhere, the current proposal should be fine (and it is the behavior of the current compiler).

https://doc.rust-lang.org/reference/subtyping.html

Is not conclusive to this point, (under what lifetime the &`static field will be accessible, but the only thing that makes sense is to assume the stricter lifetime).

perlindgren commented 2 years ago

By the way, I verified that at least for trivial cases the abstraction is zero cost and the lock would be optimized out. We rely on Rust+LLVM to evaluate the priority at compile time. In the general case this might become too complex for Rust+LLVM to figure out, and the lock would not be optimized out, but I haven't seen that in practice in my small experiments.

perlindgren commented 2 years ago

After some additional testing I haven't found a way to break the proposed API. In examples/lockall_soundness3.rs an attempt to move out the locked struct is rejected (due to lifetime error). Essentially, together with the prior soundness tests (1/2) this should cover the possibilities to break the API.

Regarding performance, to more examples were added (examples/lock_cost.rs and examples/lockall_cost.rs). The features:

[features]
inline-asm = ["cortex-m/inline-asm"]
linker-plugin-lto = ["cortex-m/linker-plugin-lto"]

were added to Cargo.toml (not sure we want them there finally, maybe there is some possibility to have these features available only for building examples...?).

I added two examples lock_cost and lockall_cost for comparison. Code for further analysis are generated by:

cargo objdump --example lock_cost --target thumbv7m-none-eabi --release --features inline-asm -- --disassemble > lock_cost.objdump

When comparing the lock and lockall they generate exactly the same code, so performance should be Ok (zero cost). In both cases a taken lock will generate:

   ...
   #[task(binds = GPIOA, shared = [shared])]
    fn low(mut cx: low::Context) {
        cx.shared.shared.lock(|shared| *shared += 1);
    }

    #[task(binds = GPIOB, priority = 2, shared = [shared])]
    fn high(mut cx: high::Context) {
        cx.shared.shared.lock(|shared| *shared += 1);
    }

// 0000016c <GPIOA>:
//      16c: 80 b5          push    {r7, lr}
//      16e: 6f 46          mov r7, sp
//      170: c0 20          movs    r0, #192
//      172: 80 f3 11 88    msr basepri, r0
//      176: 40 f2 00 00    movw    r0, #0
//      17a: c2 f2 00 00    movt    r0, #8192
//      17e: 01 68          ldr r1, [r0]
//      180: 01 31          adds    r1, #1
//      182: 01 60          str r1, [r0]
//      184: e0 20          movs    r0, #224
//      186: 80 f3 11 88    msr basepri, r0
//      18a: 00 20          movs    r0, #0
//      18c: 80 f3 11 88    msr basepri, r0
//      190: 80 bd          pop {r7, pc}

In the above case the lock is taken (low prio).

// 00000192 <GPIOB>:
//      192: 80 b5          push    {r7, lr}
//      194: 6f 46          mov r7, sp
//      196: 40 f2 00 01    movw    r1, #0
//      19a: ef f3 11 80    mrs r0, basepri
//      19e: c2 f2 00 01    movt    r1, #8192
//      1a2: 0a 68          ldr r2, [r1]
//      1a4: 01 32          adds    r2, #1
//      1a6: 0a 60          str r2, [r1]
//      1a8: 80 f3 11 88    msr basepri, r0
//      1ac: 80 bd          pop {r7, pc}

In the above the lock is optimized out.

The very same code is generated for lock and lock_all. In both cases the code could be further optimized according to 19, but the improvement is minor (we are already "zero cost" in the big picture).

In any case the proposed API does not introduce any additional OH, at least not for simple examples.

japaric commented 2 years ago

(revisiting this thread after some vacation away from the computer)

I'm not 100% sure what in Rust's semantics that actually saves us of from unsoundness in the current proposal, but it seems that when lending out an &'a mut Shared { a: &static mut T } the inner &'static mut will be demoted to `a

I'd be more concerned about (LLVM) codegen than the borrow checker or other type check not working adequately. A future version of rustc could produce LLVM IR in such a way the creation of a static reference (&'static mut T) in more than one execution path during the lifetime of the program is Undefined Behavior. Limiting the creation of &'static mut T to at most once in the lifetime of the program strikes me as valid (a valid optimization). Your use of static lifetimes creates a &'static mut T on each lock, which could happen more than once in the lifetime of the program.

Yes, you create a static reference and then reborrow it with a non-static lifetime before you hand it to the user but the intermediate step is hypothetically / potentially problematic. This is a bit like how you are not allowed to create a reference to an uninitialized struct field and then create a raw pointer from it &mut x as *mut T because that's Undefined Behavior; instead you should use ptr::addr_of_mut! to directly create a raw pointer (*mut T) without the intermediate reference (&mut T).

In any case, this is all speculation because I don't think creation of static references (&'static mut) is well specified in the language or covered by the unsafe code guidelines at the moment.

I did not use a separate trait with move semantics though.

Note that the trait uses self but does not consume the caller because it's implemented on a mutable reference so the usual re-borrowing rules apply (in non-generic context) when calling the trait method. If you want to use a method with a &mut self receiver instead of a self receiver you would need to use the GAT (Generic Associated Types) feature (which has not been implemented yet) to adequately constrain the lifetimes without using the intermediate 'static lifetime.

perlindgren commented 2 years ago

Thanks for your input japaric. I made some further experimentation, the current version does not use &'static mut, but rather &'a mut, however it may still suffer from unsoundness due to the points you mentioned.

The whole idea creating a structure reflecting the accessible shared resources as fields is not the only way though.

Maybe it would be better to create a tuple (like done now with the macro). The advantage over the current macro based multi-lock, would be ergonomics (we can lock all without splitting), and secondly performance (guaranteed to have a single critical section). This would also allow us to use the current way of dealing out the resources to the closure (so symmetric). Should not be too complex to implement either.

perlindgren commented 2 years ago

Thinking twice on the problem, tuples are indeed treated as structs in Rust, so I'm not sure it will make any difference to the soundness. However, the idea is not "bad", instead of giving an array of shared resources (for a task) the syntax could be changed to a tuple (and the order of elements would be the same for the declaration and the lockall api). This way the struct name would not need to be defined, so a bit less "magic" overall.

rtic-rs / rfcs

Lockall API #53

Problem

Suggested alternative approach

Soundness

Performance

Implementation

Pros and cons

Conclusion

Status