Support dual-core operation

leo60228 commented 3 years ago

This is important to get the full power of the RP2040. However, there's a major issue, which is that symmetric multiprocessing is explicitly unsound in embedded Rust.

jannic commented 2 years ago

Do we really have the unsoundness issue mentioned above, on RP2040? If I understand it correctly, it is caused by memory locations mapped differently on different cores. On RP2040, RAM is mapped identically, so we don't have an issue with references to regular memory.

And while some registers locations are mapped to core-specific peripherals (SIO, cortex-m internal registers), those are not directly accessible from safe rust.

So wouldn't the unsoundness issue be solved if we made sure that those core-specific peripherals were only be accessible from !Send + !Sync types?

leo60228 commented 2 years ago

Not on bare-metal, but cortex_m assumes that disabling interrupts means no code is executing concurrently, which isn't true with multiple cores.

jannic commented 2 years ago

That's unfortunate, as we could quite easily define a proper multi-core critical section on RP2040. Most of the work is already done by atomic-polyfill:

use atomic_polyfill::*;

struct RpCriticalSection;
critical_section::custom_impl!(RpCriticalSection);
unsafe impl critical_section::Impl for RpCriticalSection { 
    unsafe fn acquire() -> u8 { 
        let primask = cortex_m::register::primask::read();
        cortex_m::interrupt::disable();
        while (*pac::SIO::ptr()).spinlock0.read().bits() == 0 {};
        primask.is_active() as _ 
    } 

    unsafe fn release(token: u8) { 
        (*pac::SIO::ptr()).spinlock0.write(|w| w.bits(0));
        if token != 0 { 
            cortex_m::interrupt::enable()
        } 
    } 
}

This is just a proof of concept, blindly using spinlock0. A proper implementation should make sure the spinlock is not used by other code at the same time.

jannic commented 2 years ago

~~Not sure if a critical section must be reentrant, though.~~ Looking at the critical-section implementation for non-embedded targets, it's obvious that those sections are not meant to be re-entrant: https://github.com/embassy-rs/critical-section/blob/main/src/lib.rs#L121

9names commented 2 years ago

A proper implementation should make sure the spinlock is not used by other code at the same time.

How do you proposed doing that without using a spinlock or having atomic instructions?

Not sure if a critical section must be reentrant, though.

The point is that they are not rentrancy safe, but they will be called reentrantly, and in our case, concurrently. So you need to ensure that that only one has access to the section, and it cannot be fallible. That is the point to the critical_section token+closure.

jannic commented 2 years ago

A proper implementation should make sure the spinlock is not used by other code at the same time.

How do you proposed doing that without using a spinlock or having atomic instructions?

Before starting the second core, disabling interrupts is sufficient to avoid race conditions. So the code starting the second core could take ownership of the HW spinlock the same way as the HAL takes ownership of resources now. Once the second core is started, the spinlock can be used to to implement a multicore-capable critical section.

Not sure if a critical section must be reentrant, though.

The point is that they are not rentrancy safe, but they will be called reentrantly, and in our case, concurrently. So you need to ensure that that only one has access to the section, and it cannot be fallible. That is the point to the critical_section token+closure.

The example code above does ensure that only one execution context has access to the critical section, locking out both interrupts and the other core. If re-entrancy is necessary, it can be added by storing the number of the core which successfully acquired the critical section in some static variable. It's not implemented in the simple example, but it's not a fundamental limitation. Also, the critical section is not fallible. Worst case it wastes some cycles spinning while the other core holds the lock. Or am I missing something? (Honest question, I'm not an expert and it's quite likely that my reasoning contains mistakes.)

My current guess is that it wouldn't be difficult to implement safe SMP on the RP2040. However it's not compatible to the way the cortex-m crate decided to use Send+Sync, as the core-specific peripherals must not be Send.

9names commented 2 years ago

Sorry, I thought earlier in the thread you were referring to my PR that implemented critical section as you described. https://github.com/rp-rs/rp-hal/pull/151

So the code starting the second core could take ownership of the HW spinlock the same way as the HAL takes ownership of resources now

The way the HAL takes ownership of resources is through compile-time-only checks - none of that exists at runtime. That restricts you to a single compilation, as a second program has no knowledge of these checks. I'm sure there are multiple ways of doing this, but the simplest solution is to have a convention that some number of spinlocks are reserved for HAL locking. Then your critical_section can be static, no-one needs to pass anything around, and it will work from within interrupt context or on either ore. Once you have critical_section, you can implement another API for getting/returning a spinlocks safely.

9names commented 2 years ago

Just to be clear about what a critical section is: it's a "block of code that executes atomically (from start to finish without interruption)" By the above definition, that block of code is not allowed to be entered by multiple execution contexts - that is the point of critical_section!

jannic commented 2 years ago

Just to be clear about what a critical section is: it's a "block of code that executes atomically (from start to finish without interruption)"

Yes, obviously. Did I suggest something else?

Still, that block of code executing uninterruptedly could call some function which again wants to start a critical section (because it's not aware that it's called from inside one). The existing implementation allows that, as it only disabled interrupts, but doesn't take a lock. The suggested new ones, both your implementation and my example code would deadlock. In that sense, they are no longer reentrant. (I'm not sure if that is common terminology, but I googled it and at least I'm not the first one using it that way. https://stackoverflow.com/a/1312282 )

jannic commented 2 years ago

Sorry, I thought earlier in the thread you were referring to my PR that implemented critical section as you described. #151

Sorry as well - I was not aware of that PR when starting to comment on this issue.

To my knowledge, there are two main points to be solved to get dual-core operation:

Initialize core1 to run some code (https://github.com/rp-rs/rp-hal/pull/89)
Dual-core capable critical sections (https://github.com/rp-rs/rp-hal/pull/151)

Also, with multi-core it would be nice to have full atomics support. As you already based https://github.com/rp-rs/rp-hal/pull/151 on critical-section, it should be easy to add atomic-polyfill later.

Anything else?

9names commented 2 years ago

Thanks for explaining, I understand where you are coming from now. Yes, our existing versions will not handle recursion - I was not even considering indirect recursion while implementing mine. I'm still not sure the best way to handle it.

On top of what you've state above:

FIFOs for talking between cores (handled in #89 but it could be a separate PR)
A way to safely hand out spinlocks, ensuring they aren't already taken (need a way to copy to share them though)
A fallible mutex abstraction using spinlocks, so we can safely share pinned and static data between cores. The standard mutex will block if you didn't get the mutex, which means it has to be protected by a critical section to avoid deadlock. A "classic" lock that returns Some() if you got it or None() if you didn't would allow interrupts to keep firing.

Things we don't need, but I would like:

A completely separate program running on core1, sharing no source with the main program. this probably implies having a different boot2 that does not halt core1, instead doing static init and jump to main.
A https://github.com/rtic-rs/microamp project, building 2 executables out of the same project.

rp-rs / rp-hal

Support dual-core operation #57