raspberrypi / pico-sdk

BSD 3-Clause "New" or "Revised" License
3.26k stars 838 forks source link

Add C11 standard atomic support #1645

Open sgstreet opened 4 months ago

sgstreet commented 4 months ago

This PR Fixes #1642 Runtime for C11 atomics missing.

The file pico_atomic.c implements the required functions for the read/modify/write function of atomic variables with sizes of 1, 2, 4 and 8 bytes. Atomic load and store for 8 bytes quantities is also supported. Support for atomic_flags is also present.

To eliminate interference with existing hardware spinlock usage while reducing multicore contention on unique atomic variables, we use one of the watchdog scratch registers (WATCHDOG_SCRATCH3) to implement 16, 2 bit, multicore locks, via a variation of Dekker's Algorithm. The lock is selected as a function of the variable address and the stripe width which hashes variables addresses to locks numbers and leverages the RP2040 Atomic Register Access functionality of the WATCHDOG scratch registers.

Clearly the current approach (See __atomic_lock) is slower than using the RP2040 Hardware Spinlocks directly, but I was not sure how to assign spin locks from the reserved range (IDs 0-13) and was concerned about multicore lock contention when the number of hash buckets (hardware spinlocks) is minimized. Any recommendations concerning the number of spins lock to allocated to this library along with the stripping width (number of addresses hashed to the same lock) would greatly appreciated? My thoughts are four to eight locks with a stripe width of four bytes. Using the hardware spinlocks is an easy update.

sgstreet commented 4 months ago

As motivation to add this PR, below is a sample ticket based spin lock implementation similar to the Linux kernel. This type of implementation removes the limitation of 32 hardware spin locks at the minimal cost of potential hardware spin lock contention.

typedef atomic_ulong spinlock_t;

void spin_lock(spinlock_t *spinlock)
{
    uint16_t ticket = atomic_fetch_add(spinlock, 1UL << 16) >> 16;
    while ((*spinlock & 0xffff) != ticket);
}

unsigned int spin_lock_irqsave(spinlock_t *spinlock)
{
    uint32_t state = save_and_disable_interrupts();
    spin_lock(spinlock);
    return state;
}

bool spin_try_lock(spinlock_t *spinlock)
{
    uint32_t value = *spinlock;
    if ((value >> 16) != (value & 0xffff))
        return false;

    return atomic_compare_exchange_strong(spinlock, &value, value + (1UL << 16));
}

bool spin_try_lock_irqsave(spinlock_t *spinlock, unsigned int *state)
{
    uint32_t irq_state = save_and_disable_interrupts();
    bool locked = spin_try_lock(spinlock);
    if (!locked) {
        restore_interrupts(irq_state);
        return false;
    }

    *state = irq_state;
    return true;
}

void spin_unlock(spinlock_t *spinlock)
{
    atomic_fetch_add((uint16_t *)spinlock, 1);
}

void spin_unlock_irqrestore(spinlock_t *spinlock, unsigned int state)
{
    spin_unlock(spinlock);
    restore_interrupts(state);
}
kilograham commented 4 days ago

@sgstreet ; I'm guessing you may have closed this because we hadn't gotten to it yet. It is a useful PR, if you are still OK with us merging it; we generally add our own copyright to yours

sgstreet commented 4 days ago

Sure, let me update the PR there was a bug that needs to be fixed.