DMA based API proposal - Githubissues

rust-embedded / embedded-hal

A Hardware Abstraction Layer (HAL) for embedded systems

Apache License 2.0

1.96k stars 198 forks source link

DMA based API proposal #14

Closed japaric closed 6 years ago

japaric commented 7 years ago

In this issue I'm going to present the DMA API I have implemented for the blue-pill and that I have used in my recent robotic application with the goal of using it to start a discussion about DMA based API.

Pre-requisites

You should understand how RefCell works and how it achieves dynamic borrowing.

Detailed design

Core idea

The DMA peripheral behaves like an "external mutator". I like to think of it as an independent processor / core / thread that can mutate / access data in parallel to the main processor. From that POV a DMA transfer looks like a fork-join operation like the ones you can do in std-land with threads.

With that mindset: in a DMA transfer you want to hand out "ownership" of the data / buffer from the processor to the DMA for the span of the transfer and then claim it back when the transfer finishes. Except that "ownership" in the full sense ("the owner is in charge of calling the destructor when the value is no longer needed") is not applicable here because the DMA can't destroy the buffer is using for the transfer. So instead of ownership what the proposed API will transfer is temporary read or write access to the data.

Read or write access sounds like &- and &mut- references but the proposed API won't use those. Instead it will use buffers with RefCell-style dynamic borrowing.

`Buffer`

The main component of the proposed API is the Buffer abstraction:

struct Buffer<B, CHANNEL> {
    _marker: PhantomData<CHANNEL>,
    data: B,
    flag: Cell<BorrowFlag>, // exactly like `RefCell`'s
    status: Cell<Status>, // "Lock status"
}

enum Status {
    Unlocked,
    Locked,
    MutLocked,
}

The data and flag fields effectively form a RefCell<B>. Although not explicitly bounded B can only be an array of bytes ([u8; N]). The CHANNEL type parameter indicates which DMA channel this buffer can work with; possible values are phantom types like Dma1Channel1, Dma1Channel2, etc. Finally the status field indicates if the DMA is currently in possession of the buffer.

impl<B, CHANNEL> Buffer<B, CHANNEL> {
    pub fn borrow(&self) -> Ref<'a, B> { .. }
    pub fn borrow_mut(&self) -> RefMut<'a, B> { .. }

    fn lock(&self) -> &B { .. }
    fn lock_mut(&self) -> &mut B { .. }
    unsafe unlock(&self) { .. }
    unsafe unlock_mut(&self) { .. }
}

Buffer exposes public borrow and borrow_mut methods that behave exactly like the ones RefCell has. Buffer also exposes private locks and unlocks methods that are meant to be used only to implement DMA based APIs.

The lock and unlock pair behaves like a split borrow operation. A borrow call will check that no mutable references exist and then hand out a shared reference to data wrapped in a Ref type while increasing the Buffer shared reference counter by one. When that Ref value goes out of scope (it's destroyed) the Buffer shared reference counter goes down by one. A lock call does the first part: it checks for mutable references, increases the counter and hands out a shared reference to the data. However the return type is a plain shared reference &T so when that value goes out of scope the shared reference counter won't be decremented. To decrement that counter unlock must be called. Likewise the lock_mut and unlock_mut pair behaves like a split borrow_mut operation.

You can see why it's called lock and unlock: once locked the Buffer can't no longer hand out mutable references (borrow_mut) to its inner data until it gets unlocked. Similarly once a Buffer has been lock_muted it can't hand out any reference to its inner data until it gets unlock_muted.

DMA based API

Now let's see how to build a DMA based API using the Buffer abstraction. As an example we'll build a Serial.write_all method that asynchronously serializes a buffer using a DMA transfer.

impl Serial {
    pub fn write_all<'a>(
        &self,
        buffer: Ref<'a, Buffer<B, Dma1Channel4>>,
    ) -> Result<(), Error>
    where
        B: AsRef<[u8]>,
    {
        let usart1 = self.0;
        let dma1 = self.1;

        // There's a transfer in progress
        if dma1.ccr4.read().en().bit_is_set() {
            return Err(Error::InUse)
        }

        let buffer: &[u8] = buffer.lock().as_ref();

        // number of bytes to write
        dma1.cndtr4.write(|w| w.ndt().bits(buffer.len() as u16));
        // from address
        dma1.cmar4.write(|w| w.bits(buffer.as_ptr() as u32));
        // to address
        dma1.cpar4.write(|w| w.bits(&usart1.dr as *const _ as u32));

        // start transfer
        dma1.ccr4.modify(|_, w| w.en().set());

        Ok(())
    }
}

Aside from the Ref in the function signature, which is not a cell::Ref (it's a static_ref::Ref), this should look fairly straightforward. For now read Ref<'a, T> as &'a T; they are semantically equivalent. I'll cover what the Ref newtype is for in the next section.

Let's see how to use this method:

static BUFFER: Mutex<Buffer<[u8; 14], Dma1Channel4>> =
    Mutex::new(Buffer::new([0; 14]));

let buffer = BUFFER.lock();

// mutable access
buffer.borrow_mut().copy_from_slice("Hello, world!\n");

serial.write_all(buffer).unwrap();

// immutable access
let n = buffer.borrow().len(); // OK

// mutable access
// let would_panic = &mut buffer.borrow_mut()[0];

At this point the transfer is ongoing and we can't mutably access the buffer while the transfer is in progress. How do we get the buffer back from the DMA?

impl<B> Buffer<B, Dma1Channel4> {
    /// Waits until the DMA transfer finishes and releases the buffer
    pub fn release(&self) -> nb::Result<(), Error> {
        let status = self.status.get();

        // buffer already unlocked: no-op
        if status == Status::Unlocked {
            return Ok(());
        }

        if dma1.isr.read().teif4().is_set() {
            return Err(nb::Error::Other(Error::Transfer))
        } else if dma1.isr.read().tcif4().is_set() {
            if status == Status::Locked {
                unsafe { self.unlock() }
            } else if status == Status::MutLocked {
                unsafe { self.unlock_mut() }
            }

            // clear flag
            dma1.ifcr.write(|w| w.ctcif5().set());
        } else {
            // transfer not over
            Err(nb::Error::WouldBlock)
        }
    }
}

The Buffer.release is a potentially blocking operation that checks if the DMA transfer is over and unlocks the Buffer if it is. Note that the above implementation is specifically for CHANNEL == Dma1Channel4. Other similar implementations can cover the other DMA channels.

Continuing the example:

serial.write_all(buffer).unwrap();

// immutable access
let n = buffer.borrow().len(); // OK

// .. do stuff ..

// wait for the transfer to finish
block!(buffer.release()).unwrap();

// can mutably access the buffer again
buffer.borrow_mut()[12] = b'?';

serial.write_all(buffer).unwrap();

Alternatively, using callbacks / tasks:

fn tx() {
    // ..

    SERIAL.write_all(BUFFER.lock()).unwrap();

    // ..
}

// DMA1_CHANNEL4 callback
fn transfer_done() {
    BUFFER.lock().release().unwrap();
}

`static_ref::Ref`

The Ref<'a, T> abstraction is a newtype over &'a T that encodes the invariant that the T to which the Ref is pointing to is actually stored in a static variable and thus can't never be deallocated.

The reason Ref is used instead of just &- in the write_all method is to prevent using stack allocated Buffers with that method. Stack allocated Buffers are rather dangerous as there's no mechanism to prevent them from being deallocated. See below what can happen if a Serial.read_exact method used &- instead of Ref:

fn main() {
    foo();
    bar();
}

#[inline(never)]
fn foo() {
    let buffer = Buffer::new([0; 256]);

    SERIAL.read_exact(&buffer);

    // returns, DMA transfer may not have finished, `buffer` is
    // destroyed / deallocated
}

#[inline(never)]
fn bar() {
    // DMA transfer ongoing; these *immutable* values allocated on the stack
    // will get written to by the DMA
    let x = 0u32;
    let y = 0u32;

    // ..
}

Unresolved questions

Is there an equally flexible (note that with this approach the DMA transfer start and finish operations can live in different tasks / interrupts / contexts) alternative that involves no runtime checks?
Should we extend this to also work with Buffers allocated on the stack? My first idea to allow that was to add a "drop bomb" to Buffer. As in the destructor will panic if there's an outstanding lock. See below. (But this probably a bad idea since panicking destructor is equal to abort (?))

{
    let buffer = Buffer::new(..);
    buffer.lock();
    // panic!s
}

{
    let buffer = Buffer::new(..);
    buffer.lock();
    unsafe { buffer.lock() }
    // OK
}

{
    let buffer = Buffer::new(..);
    let x = buffer.borrow();
    // OK
}

Do note that an unsafe Ref::new exists so you can create a Buffer on the stack and wrap a reference to it in a Ref value. This will be like asserting that the buffer will never get deallocated. Of course it's up to you to ensure that's actually the case.

In the blue-pill implementation Buffer is defined in the blue-pill crate itself but to turn DMA based APIs into traits we would have to move that Buffer abstraction into the embedded-hal crate. That complicates things because (a) lock et al. would have to become public and (b) e.g. Dma1Channel1 wants to be defined in the stm32f103xx-hal-impl crate but that means one can't write impl Buffer<B, Dma1Channel1> due to coherence. I have no idea how to sort out these implementation details.
This API doesn't support using a single Buffer with more than one DMA channel (neither serially or concurrently). Is that something we want to support?
Since we have the chance: should we reduce the size of the flag field? The core implementation uses flag: usize but 4294967295 simultaneous cell::Ref sounds like a very unlikely scenario in a microcontroller.

japaric commented 7 years ago

cc @whitequark @adamgreig @vitiral @Kixunil

japaric commented 7 years ago

Read or write access sounds like &- and &mut- references but the proposed API won't use those.

To elaborate why it doesn't: I think plain references are not enough for memory safety. Let me reimplement Serial.write_all using references and #[async] (you can translate this to futures by writing more boilerplate)

impl Serial {
    // this is a mixture of the original Serial.write_all and Buffer.release
    #[async]
    fn write_all(&self, buffer: &[u8]) -> Result<(), Error> {
        let usart1 = self.0;
        let dma1 = self.1;

        // There's a transfer in progress
        if dma1.ccr4.read().en().bit_is_set() {
            return Err(Error::InUse)
        }

        let buffer: &[u8] = buffer.lock().as_ref();

        // number of bytes to write
        dma1.cndtr4.write(|w| w.ndt().bits(buffer.len() as u16));
        // from address
        dma1.cmar4.write(|w| w.bits(buffer.as_ptr() as u32));
        // to address
        dma1.cpar4.write(|w| w.bits(&usart1.dr as *const _ as u32));

        // start transfer
        dma1.ccr4.modify(|_, w| w.en().set());

        loop {
            if dma1.isr.read().teif4().is_set() {
                return Err(Error::Transfer)
            } else if dma1.isr.read().tcif4().is_set() {
                if status == Status::Locked {
                    unsafe { self.unlock() }
                } else if status == Status::MutLocked {
                    unsafe { self.unlock_mut() }
                }

                // clear flag
                dma1.ifcr.write(|w| w.ctcif5().set());

                return Ok(())
            } else {
                // transfer not over
                yield
            }
        }
    }
}

Here's an example of using the above API to cause memory unsafety without using the unsafe keyword.

fn main() {
    foo();
    bar();
}

fn foo() {
    let mut buffer = [0; 256];

    let gen =  SERIAL.read_exact(&mut buffer);

    // the right thing to do is: drive the transfer to completion
    // loop { gen.resume() }

    // but nothing forces me to do that so I can return here
    // deallocating `buffer` while the DMA transfer is ongoing
}

fn bar() {
    // the DMA transfer continues and corrupts this stack frame
    let x = 0u32;
    let y = 0u32;
}

vitiral commented 7 years ago

I might be missing something, but what is the point of the lock* methods? They seem like a really unsafe API (if you forget to unlock you have deadlock!).

Conversely, the Buffer -> Ref api looks pretty much exactly like a Mutex -> MutexGuard to me except that you can have multiple Refs. Can you explain the differences? I'm wondering if having multiple Refs is really valuable for us.

adamgreig commented 7 years ago

Overall this seems really nice though I'd strongly prefer just &/&mut if we can find a way to make it work. It would be easier to use and very flexible. Opportunity for accidental memory unsafety seems very high.

Although not explicitly bounded B can only be an array of bytes ([u8; N]).

Why? A lot of the time (SPI, ADC, DAC, etc) the DMA transfers u16 or u32, and ndtr refers to the number of items to transfer rather than the number of bytes.

Should we extend this to also work with Buffers allocated on the stack?

I think this is less common than static buffers but not entirely uncommon; you might well want to allocate something on the stack because you're only going to need it inside this scope, but then still want to use it with DMA before you're done with this function. I think this is much more true of multithreading where you have a stack for each thread/coroutine and so the buffer would live in that thread's stack. On a separate note, what about heap allocated Buffers for systems using dynamic allocation?

I think plain references are not enough for memory safety.

I'm aware "you can't rely on Drop happening" but I wonder if we could have the DMA stop the stream on drop, and then you could perhaps allow &/&mut directly?

This API doesn't support using a single Buffer with more than one DMA channel (neither serially or concurrently).

I've never needed to, but consider most DMA peripherals treat TX and RX as separate streams: you might well want to DMA from a UART, modify the data somehow, then DMA it back out. Or DMA from an ADC, modify the buffer in-place, then replay it out the DAC. That would all require using the Buffer in different channels serially. Streaming the same data out of two independent DACs would require concurrent access (though on the STM32 specifically this isn't an issue as you can use the DACs together by writing a single register). It would be a shame for this use case to be totally impossible...

In the blue-pill implementation Buffer is defined in the blue-pill crate itself but to turn DMA based APIs into traits we would have to move that Buffer abstraction into the embedded-hal crate.

Can we have a DMAChannel trait in embedded-hal and then devices implement this against their specific DMA peripherals, for each channel?

japaric commented 7 years ago

@vitiral

I might be missing something, but what is the point of the lock* methods?

They are implementation details (non user facing APIs) used to implement the safe user facing API (Serial.write_all / Buffer.release)

if you forget to unlock you have deadlock!

You can't deadlock at least not in the Mutex sense. If you try to access a Buffer that has not been released then you get a panic!. If you forget to do Buffer.release when the transfer finished you would also run in this situtation; it's not inherent to the lock / unlock API.

Conversely, the Buffer -> Ref api looks pretty much exactly like a Mutex -> MutexGuard to me except that you can have multiple Refs.

Actually it's modelled exactly the same as a RefCell's Ref. But if you want to use the Mutex analogy then it would be equivalent to a RwLock.

Can you explain the differences? I'm wondering if having multiple Refs is really valuable for us.

You can read the buffer while it's also being read by the DMA in parallel. Not that I can give you an example about why that's useful off the top of my head. But it's a memory safe possibility and I don't see a reason to not allow it; someone may find it useful.

If the overhead of the reference count worries you we can add a Buffer variant that only hands out a single RefMut (&mut-) and uses a bool, instead of an usize, to keep track of the count.

@adamgreig

Although not explicitly bounded B can only be an array of bytes ([u8; N]).

Why?

I actually meant to type "array of bytes (e.g. [u8; N])" meaning that B should have type [T; N] but you can't bound the type like that in today's Rust. At the end it's up to the DMA based API to pick what B will be: [u8; N], [u16; N], etc.

I think this is much more true of multithreading where you have a stack for each thread/coroutine and so the buffer would live in that thread's stack.

You mean never-ending threads? I have mentioned before that stuff allocated in never-ending functions (fn() -> !) should have a different lifetime to indicate that they can't be deallocated. That would be useful here but that feature doesn't exist.

On a separate note, what about heap allocated Buffers for systems using dynamic allocation?

At first glance it seems that it would have the same problem at stack allocated arrays as in "there's no way to enforce that the boxed slice won't be deallocated midway the DMA transfer". I suppose the API could be tweaked to hand over the ownership of the buffer to the DMA (?) during the transfer and then have some DMA.release method to get the buffer (Box<[T]>) back. That raises the question where the Box<[T]> is stored during the transfer ...

I wonder if we could have the DMA stop the stream on drop, and then you could perhaps allow &/&mut directly?

You can't rely on drop happening :-). You can safely mem::forget the droppable thing that would have stopped the DMA and have Bad Stuff happen without using the unsafe keyword.

A proper solution to this "you can't rely on drop" stuff is adding linear types (I think that's what they are called) to Rust. A linear type means "this value MUST be dropped / consumed in this scope (unless you return it from the scope)". Rust has affine types: "this value should be dropped / consumed in this scope but it's fine if it's not (mem::forget)".

I've never needed to, but consider ...

OK but if we add this kind of feature we would have to lose some static typing. We would probably have to store in the buffer (i.e. at runtime) which channel is it being used with. Which means an enum of DMA channels rather encoding the channel as a phantom type. Seems doable but needs prototyping.

Can we have a DMAChannel trait in embedded-hal and then devices implement this against their specific DMA peripherals, for each channel?

Possibly. It will probably involve some ad-hoc traits that are implementation details and that are not supposed to be used by the end user.

whitequark commented 7 years ago

Although not explicitly bounded B can only be an array of bytes ([u8; N]).

You can use an AsMut<[u8]> bound here I think.

But this probably a bad idea since panicking destructor is equal to abort (?)

It is possible to have unwinding on larger microcontrollers (I know someone who did it on the less modestly sized STM32s) but it seems like an idea bad enough that I'm not sure if we ever want to support it...

A linear type means "this value MUST be dropped / consumed in this scope (unless you return it from the scope)". Rust has affine types: "this value should be dropped / consumed in this scope but it's fine if it's not (mem::forget)".

This doesn't work. An Rc loop is constructible safely and is functionally equivalent to mem::forget. This has been discussed to death back when mem::forget was stabilized; the signature of mem::forget is exactly identical to any other function that consumes a value and the function itself is equivalent to "put the object somewhere inaccessible to the rest of the code".

japaric commented 7 years ago

You can use an AsMut<[u8]> bound here I think.

Yeah that would work but it's limited to [T; 32]. This really wants to be struct Buffer<const N: usize, T, CHANNEL> { data: [T; N], .. } though. Hopefully that's not too far off in the future.

But this probably a bad idea since panicking destructor is equal to abort (?)

I was actually thinking of double / nested panics when unwinding would call a destructor that panics (IDK if, today, that situation causes an abort in std-land). But ... since ARM micros default to panic = abort that situation can't occur because the first panic will cause an abort.

This doesn't work.

Right. So, @adamgreig, we can't rely on destructors to preserve memory safety due to mem::forget.

Kixunil commented 7 years ago

How bad would it be to use closures to ensure "destructor" is run? (Like in case of scoped threads.)

whitequark commented 7 years ago

It's pretty bad, but I guess you can always fall back to an unsafe or poisoning API.

japaric commented 7 years ago

@Kixunil that would be too limiting. With a closures based API you can't split the start / finish operations in two tasks / interrupts, or create a futures based API to select two or more pending DMA transfers. It could be an alternative API if there's a need for it.

vitiral commented 7 years ago

Ok, I've gotten more of a chance to look this over. First of all, I believe that standardized shared buffers are probably one of the most critical requirements for building a microcontroller ecosystem -- so thankyou for putting in the effort of crafting an API around it

Simplicity of Data Sharing

I have written the library defrag-rs with the goal of allowing sharing and defragmentation of memory. I strongly advise the API presented there where memory pools give out Mutexes to their data which track when the data is locked, when it has a reference to it (may be used in the future) and when it is no longer in use. We could then have ALL microcontroller memory managers use this API.

The basic idea is that you might have an ecosystem of memory "pool" libraries that keeps track of a whole bunch of different data buffers. You would call Pool.get() to get a typed Mutex (or RwLock) which you could then pass on to any function (or you can get a reference and pass it on).

A few key points:

I don't think the type should be a Buffer. We can use mem::size_of::<T>() to make sure that type T fits inside of whatever underlying "buffer" exists. Buffers aren't special -- we may want to pass around arrays of other types than u8 or large structs (note that I'm guilty of this too with defrag::Slice). For instance, I think functions could just accept Mutex<HeaplessVec<T, A>> where A is a generic array and HeaplessVec is similar to (or is) heapless::Vec
I don't think the proposed static_ref::Ref gives us anything. Buffer (or Mutex) must already be guaranteed to not get freed when the function returns ~~(although, you would have to return the Buffer in order for the data to stay reserved... which you have to do anyway to return the result!)~~ (Edit: no you wouldn't... the function could store the Buffer in a static struct to execute asyncronously). We should think of all of these objects more like "heap allocated" data than anything else.

Comments on `RwLock`

I think RwLock is dangerous because you might have too many functions holding read-only references and it would be hard to ever get access to writing. In most cases, you should be able to lock the data and then pass out references if you want multiple readers. In other cases, lock should be sufficient.

Also, defrag manages to implement the lock using a single bit -- other memory management libraries can probably do the same. Doing this with RwLock would be completely impossible -- at minimum you would want to use at least one byte (for count of readers) and really you should use a usize. This kind of bit stuffing can make these libraries truly universal.

whitequark commented 7 years ago

The basic idea is that you might have an ecosystem of memory "pool" libraries that keeps track of a whole bunch of different data buffers.

I'm not particularly happy about this design. It seems to lock me into dynamic memory management of sorts.

vitiral commented 7 years ago

@whitequark the application specific "pools" could be statically allocated buffers where you know all the (typed) indexes at compile time (and they would be checked at compile time) -- or they could be semi-dynamic where there is a set number of known-size buffers, or they could be from a library like defrag where the memory allocation is truly dynamic.

We would have to make sure that ALL of these are supported -- but I think they can be 😄

japaric commented 7 years ago

I agree with @whitequark in that I wouldn't want the API here to be locked into some dynamic memory management strategy. I do would like this API to also work with dynamically allocated memory, but right now I can't see how to implement such thing safely: it all boils down to solving the "you must not destroy this buffer while the DMA is using it" problem which I have solved here with static_ref::Ref which effectively means "the buffer can't never be destroyed". The only other solution that I can think of are panicking destructors which are rather bad.

Code-wise Buffer can be extended to support fixed size arrays, boxed arrays and slices by turning it into a DST: struct Buffer<.., B> { .., data: B } where B can be [T; N] or [T] allowing Box<Buffer<.., [T]>>, but the destructor problem I mentioned above remains.

Comments on RwLock

Like I said above we can have two variants: the current "RefCell" style one and a "RefMutCell" one that tracks the current borrow using a bool and can only hand out a single &mut- reference at a time. I don't see a reason to not provide the flexibility of allowing both.

vitiral commented 7 years ago

@japaric if Mutex was a trait then it would be tied to any kind of allocation scheme (any allocation library could implement it). And mutex.lock().unwrap() would be the equivalent of Refcell.lock()

vitiral commented 7 years ago

I really think the simplest way to think about this is we want types that work like data on the heap, but are actually statically allocated.

vitiral commented 7 years ago

Also, I agree that data has to be backed statically. Maybe the type should be renamed StaticMutex to make that clear

adamgreig commented 7 years ago

On a slightly different note, a common DMA use case that I don't think is covered here are circular buffers, where a single large buffer is read/written continuously in a loop. Typically you have a "half complete" interrupt as well as a "complete" interrupt, allowing you to perform processing in one half of the buffer while the DMA deals with the other half. I use this to stream ADC data into memory and then deal with half of it while the ADC continues reading into the other half; since the ADC runs at a particular sampling frequency I can't afford to stop/start the DMA or otherwise interrupt it. As far as I can tell the whole purpose of the Buffer is to ensure you can't do this (write into one half while DMA is reading the other half, or vv).

A very similar use case is double buffered DMA, where the DMA is given two buffers and automatically swaps between them as each one is filled. This might be the easier scenario to handle: the DMA-using function would be given two separate buffers and uses the half-complete/complete interrupts to lock/unlock them in turn. Once you had that working you probably wouldn't need to support circular single buffers, as the user could just take their original buffer and split it into two consecutive buffers. There might be some devices that only support circular buffers and not double buffers: those could have the device-specific implementation require the two buffers passed in are consecutive in memory, then hand them to the DMA as a single circular buffer and manage locking/unlocking as usual.

Ericson2314 commented 7 years ago

I need to catch up on thread, but I always thought &move and &out would be good for this. Having the peripheral own the initialized or uninitialized arrays seems correct.

japaric commented 7 years ago

@vitiral I don't see how e.g. a memory pool being statically allocated helps here; as long as you have a dynamic memory allocation scheme you can deallocate the buffer the DMA is using and that would be Bad. The problem to solve here is how you prevent that from happening without relying on destructors running for memory safety.

@adamgreig Already on my TODO list. I was wondering if it's possible to support circular / doube buffering withouting creating a new Buffer type for that kind of operation but it's probably easier if I create a CircBuffer type with two back to back buffers and write some application first and then see if the types can be merged somehow. I was also wondering if there's an use case for circular buffers when there's no half-transfer interrupt / flag available. I don't know if there are chips like that; the STM chips support circular and provide a half-transfer flag but IDK if that's universal.

@Ericson2314 I don't immediately see how &move / &out helps here; I thought that was meant to avoid initializing e.g. arrays and statically forbid the initialization routine from reading the uninitialized memory.

Having the peripheral own the initialized or uninitialized arrays seems correct.

That's one option but @adamgreig mentioned above the use case of wanting to use a single statically allocated buffer with several peripherals. With ownership that seems to me like it would get messy really quickly specially if your peripherals are in static variables (you end with array types in their signature); transferring buffer ownership across peripherals would possibly involve Option types under the hood and (panicky) Option.take and then unwrap operations.

vitiral commented 7 years ago

I've opened #15 with my counter proposal.

@japaric I think the key you might be missing is that you have to pass the Mutex around by ownership (M not &M). If you always are pasing ownership then it will never get dropped and then deallocated (and you CAN rely on drop mechanics). Methods then return the full Mutex when they are done.

When you want to share a Mutex with another thread, you have to use Arc like in the example code for std::sync::Mutex.

I'm pretty sure rust's lifetimes will prevent you from doing anything stupid here (unless you use unsafe). Do you have a use case where this is not true? (I would be shocked, as this is the exact same API as rust's stdlib).

vitiral commented 7 years ago

Also, it is really important to poing out that RwLock is just a subset of Mutex. If your function is going to write to the data then it might as well take a Mutex.

If libraries defined a type that implements RwLock those same types could also implement Mutex. They would just have to treat Mutex::lock == RwLock::lock_mut.

Therefore Mutex is far more general, and you might as well accept Mutex if you are going to be writing to the data anyway.

vitiral commented 7 years ago

Rethinking Things

I've been thinking about this some more, and I think we are taking the fundamental wrong approach here. Where we are wrong is that data needs to be locked at all. I felt like this as the beginning because I had @japaric's RTFM blog post in the back of my head which guanatees, among other things, "deadlock free execution guaranteed at compile time."

Rust's std library gets away with avoiding a concurrency primitive (i.e. greenthreads) because:

it has an expressable type system
the OS scheduler is extremely flexible

I get the impression that this API is trying to be "general" like the stdlib -- for use with frameworks other than RTFM (like TockOS, or with no concurrency framework). I think this is a serious mistake. RTFM allows us to have lock and deadlock freedom with no performance overhead. For the embedded standard library of the future we need to be opinionated about our concurrency primitives in order to realize such massive gains.

Proposal

For no_std we still have the type system, but we do not have flexibility of the scheduler. Quite the oposite -- we have the rigidity of interrupt levels. We can use this rigidity to our advantage and make even more robust programs than a conventional approach would alow (whether asyncio/futures based or not!). RTFM is that approach.

In REQ-layers I have laid out what I think are the primary layers necessary for development. However, I am now thinking that the concurrency layer needs to be in layer4: the HAL must depend on it.

In summary, the following layers should be based on RTFM, the other layers will be general and could be used in any embedded project:

application (user programs will use all the layers)
communication, board_api
generic_protocols (i.e I2C, CAN, TCP), external_devices
HAL (this RFC), concurrency (RTFM)

What this means for this issue

Neither the Buffer or a Mutex makes any sense for peripherals. All peripherals will be interrupt driven through RTFM and will therefore be guaranteed to be "unlocked" at compile time -- no runtime checks are necessary. We should share data through the RTFM framework, and the data can be of any type that is valid for the peripheral (we should open a separate issue to discuss data types such as circular buffers, etc).

japaric commented 7 years ago

@vitiral

I think the key you might be missing is that you have to pass the Mutex around by ownership (M not &M). If you always are pasing ownership then it will never get dropped and then deallocated (and you CAN rely on drop mechanics). Methods then return the full Mutex when they are done.

I don't see how that would work except for having the method block for the duration of the whole DMA transfer, i.e. make the method blocking, which is the opposite of what you want to use the DMA for (you want a parallel memory transfer) in the first place.

I get the impression that this API is trying to be "general" like the stdlib -- for use with frameworks other than RTFM

Yes, that's an explicit design goal, cf. bullet 3: callback model = RTFM tasks.

I think this is a serious mistake.

I don't think it's a mistake. If we want to have code sharing to prevent every concurrency framework from rewriting the same register manipulation code and to have to deal with less bugs in the end then we have to be agnostic about the concurrency model in the lowest HAL layer which is this crate.

If you want a HAL tailored for RTFM tasks then you can build a hal-rtfm crate on top of this one.

RTFM allows us to have lock and deadlock freedom with no performance overhead.

Indeed it does that as far as concurrency and data race freedom are involved. However, it's not a panacea: DMA transfers, being asynchronous, would require scheduling analysis and whole program static analysis integrated into the compilation process to have zero performance overhead (i.e. no runtime checks) but retaining safety (lack of unsafe) at the source code level.

I've tested this DMA based API with the RTFM framework and the runtime checks and the static_ref::Ref abstractions are required for memory safety. Perhaps it's possible to come up with a better API with less checks -- I don't know. I personally don't see better alternatives (modulo an unsafe API) at this point.

the HAL must depend on it (concurrency layer = tasks).

Note that RTFM tasks are not the only way to do multitasking in the RTFM framework. You can use co-routines / futures in the idle loop to do cooperative multitasking. So even if the RTFM framework ends up being the framework of choice for concurrency we would still need a HAL that works with co-routines and futures. Having a hal-futures, hal-rtfm and hal-coroutines built on top of this concurrency model agnostic crate would lead to less code duplication.

@whitequark

TIL (a few days ago actually) that you can use the Unsize trait to abstract over [T; N] in a way that's not limited to N <= 32. Example:

fn foo(array: &A) where A: Unsize<[u8]> {
    let slice: &[u8] = array;
}

foo(&[0; 33]);
foo(&[0; 1024]);

The problem is that the Unsize trait is an unstable feature of the core library. Maybe not a problem if you are OK with making your library nightly only.

@adamgreig

I was thinking about a non-stoppable circular DMA API and I can't see a way to make it safe unless ... ~~you tightly integrate with interrupt handlers which would be (a) very device specific and (b) probably not too flexible~~ nvm, I don't think that would work either. This seems like one of those problems where safety is tied to scheduling analysis ("make sure this code finishes before the next event occurs")

Anyhow the problem I see goes like this:

struct CircBuffer<A> {
   first_half: A,
   second_half: A,
   // plus some other state to track which half is used by the DMA or borrowed
   // by the main processor
}

// you start a circular DMA transfer that uses the CircBuffer
let circ_buffer = ..;
start_circular_dma(circ_buffer);

// now you borrow the second half while the DMA is using the first half
let half: &mut [u8] = circ_buffer.borrow_mut();

loop {
    // do stuff with half, but never break from `loop`
}

The problem there is that, because the DMA transfer never stops, at some point the DMA will be mutating the half you are also mutating or reading. This goes against Rust borrowing principles because two parallel processes now have mutable access to the same piece of data.

The problem goes away (maybe?) if the DMA transfer is not "automatically" circular. In the sense that you still have two buffers and half and complete transfer events but you must manually restart a new DMA transfer on the same buffer. This would add some overhead to the circular operation mode but maybe it's not too bad that even hardcore application can get away with this manual restart?

whitequark commented 7 years ago

@japaric

TIL (a few days ago actually) that you can use the Unsize trait to abstract over [T; N] in a way that's not limited to N <= 32. Example:

That's ironic; I knew about Unsize but have not managed to make it work when experimenting.

The problem is that the Unsize trait is an unstable feature of the core library. Maybe not a problem if you are OK with making your library nightly only.

Assuming it's going to be stabilized that seems fine, since all embedded work is nightly-only anyway in foreseeable future.

adamgreig commented 7 years ago

This would add some overhead to the circular operation mode but maybe it's not too bad that even hardcore application can get away with this manual restart?

Hmm. Maybe. For my use cases I have a timer trigger the ADC, which takes a sample, and then the ADC fires a DMA request to move a word from the ADC register to memory. To avoid missing any samples you need to restart the DMA after the transfer complete interrupt, but before the next sample completes. At full sampling rate and full clock speed you have perhaps 70 cycles between samples, fewer once the DMA has actually fired an interrupt, which I think might just be doable but seems pretty tight to rely on.

At lower sample rates (you might only be sampling at a few 10s of ksps) you'd have loads of time, no problem. It becomes mildly annoying to have to restart the DMA each transfer, but it can be made nice by a nice API, and the unsafe hatch is still there to use if you'd rather. I'd still worry about potentially missing a sample because the API can't ensure that your DMA-restarting code actually runs in time, but at least with a suitable interrupt priority and so forth it should be possible.

At high sample rates you might just have to accept that it will be an unsafe thing and you have to be a little careful with your circular buffer? I agree with your analysis -- we can't statically enforce that the code using the reference to one half will complete before the DMA swaps halves. Ultimately if your code hits this problem it's because it already cannot keep up with the data rate, so you were sort of lost from the beginning anyway.

I'd still like to find some good way of doing circular/double buffers because it is a fairly common use case for datalogging and DSP and most things where you are running the ADCs at a serious rate, but it seems pretty tough. Perhaps the best we can do is an easy to use but unsafe API, with suitable documentation around it. It might also be possible to detect the failure condition after the fact in some cases -- when the user asks for the other half of the buffer but the DMA has already swapped to it for instance -- and throw a panic when that happens, just to aid development/debugging.

My feeling coming from C is that I'm not that unhappy about a few unsafe blocks here and there, especially if they can be relatively benign to handle. I'd be handling everything carefully in C, instead. But I worry this blinds me to what might otherwise be possible with Rust!

PS. :+1: for keeping embedded-hal async API agnostic, especially with a view to potential scheduling or cooperative multitasking in the idle loop of rtfm.

Ericson2314 commented 7 years ago

@whitequark

This doesn't work. An Rc loop is constructible safely

We can add a new auto trait for non-mandatory destructors and make RC and mem::forget use it.

@japaric

You mean never-ending threads? I have mentioned before that stuff allocated in never-ending functions (fn() -> !) should have a different lifetime to indicate that they can't be deallocated. That would be useful here but that feature doesn't exist.

Tails-calls down the road might somewhat complicate this, but hopefully there is a work around.

[In general low-level event-driven code is fundamentally written in a continuation-passing style, and Rust doesn't really deal well with that on many levels. I was hoping that with the focus on futures-rs, the core time would have reason to worry about these things up the stack, but since that became more pull-based rather than push-based, they've dodged these issues and the receded back into obscurity, sigh.]

The DMA I've thought most about is high bandwidth NICs ("thought" as haven't written anything actually here, so please take with grain of salf!). The "buffer descriptor queue" model many use makes ownership rather than borrowing more natural--even if extra data needs to be tracked, it can be stored in an a "side ring buffer" I suppose (morally equivalent to ring buffer with bigger cells as they share head / tail state, but compatible with the NIC ABI).

I don't immediately see how &move / &out helps here; I thought that was meant to avoid initializing e.g. arrays and statically forbid the initialization routine from reading the uninitialized memory.

I suppose since u8 is plain old data, there's less benefit. But if we had a DerefMove trait to go with &move, then an API for this buffer descriptor DMA could use DerefMove<[u8]> to avoid locking in the use of Box while keeping ownership-based semantics.

vitiral commented 7 years ago

@japaric I think I am beyond my capability of understanding things here. I must just not be wrapping my head around the DMA.

Some quick comments I think could help:

it makes sense to keep concurrency primivies like you said (see comments below)
I'm having trouble understanding why you couldn't use &'static Bufer instead of static_ref::Ref<Buffer> -- aren't they the same thing? (you seem to say that in that secition in the first post)

don't see how that would work except for having the method block for the duration of the whole DMA transfer

Are you saying the method goes out of scope before it completes without being able to store its buffer somewhere? Who is owning the Ref<'a, Buffer<B, Dma1Channel4>> in your Serial example...? ... oh my god, I think I just got it.

Nobody keeps ownership on the Buffer. You are defining these functions as if they were in a separate CPU because they bascially ARE in a separate CPU -- it's just that two CPU's are using the same program simultaniously. The only way to tell which channel is/will be used is Dma1Channel4 types.

I think I'm starting to get it... it is so much lower-level than memory management -- this is really low level stuff. Most of what I've been saying has been really not helpful -- sorry about that.

Ericson2314 commented 7 years ago

Even when dealing solely with DMA initiated by the CPU (as opposed the NIC case where the outside world sending packets triggers DMA), I think an ownership-based model can still work, with some global data structure would own the the buffer on the peripheral's behalf:

Initiate DMA request, parking buffer in the global "parking lot" as if it was giving it to the peripheral.
Interrupt handler pulls buffer out of global "lot", as if it as an argument provided by the "calling" peripheral.

This relies on either one pending DMA request at a time, or some way to disambiguate between pending requests to key into the parking lot. I assume that won't be an onerous requirement as other stuff probably needs it anyways.

japaric commented 7 years ago

@vitiral

I'm having trouble understanding why you couldn't use &'static Bufer instead of static_ref::Ref -- aren't they the same thing? (you seem to say that in that secition in the first post)

They are semantically the same thing once you have one in your hands. The difference is how you get one or the other. To get a &'static Buffer you need a static B: Buffer = .. but that can't be constructed because Buffer is not Sync (not safely shareable by threads / interrupts / tasks) and static variables must implement Sync. So to build a statically allocated global Buffer you need something like static B: Mutex<Buffer>. The thing is that the &'a Buffer you can get from B can't never be a &'static Buffer because the &- reference means that you have access to the Buffer for the span of code (region) 'a. 'a == 'static would mean that the reference can escape the critical section for which the inner Buffer can be accessed and that would be unsound.

Example:

static B: Mutex<Buffer> = ..;

fn main() {
    let buffer = cortex_m::interrupt::free(|cs| { // interrupts are disabled here
        // let's assume that borrow returns `&'static Buffer`
        // (the existing API returns `&'cs Buffer` where `'cs` is the lifetime of
        // this critical section)
        let buffer: &'static Buffer = B.borrow(cs);

        buffer // now buffer can escape this critical section (because of `'static`)
    }); // interrupts are re-enabled here

    // BAD the buffer can be accessed outside the critical section; this is racy
    buffer.borrow();
}

Are you saying the method goes out of scope before it completes without being able to store its buffer somewhere?

Visually this is more or less what happens:

(sorry for the crappy diagram)

//                                      CPU                 DMA
let buffer = BUFFER.lock();

// ..

serial.write_all(buffer).unwrap();   // Ref1 ------------->
                                     //                     Ref1
let n = buffer.borrow().len();       // Ref2                Ref1
                                     //                     Ref1
block! {                             //                     Ref1
    buffer                           //      <------------- Ref1
        .release()                   // drop(Ref1)
}.unwrap();

When the DMA transfer starts the CPU conceptually sends a buffer::Ref (this is a RefCell style Ref) to the DMA -- under the hood Buffer.lock is called, but there's no real Ref transfer (the DMA can't "hold" arbitrary data). While the DMA transfer is in progress it's as if the DMA was holding a Ref to the buffer so the CPU can only immutably borrow the buffer (RefCell style borrow) -- trying to mutably borrow the buffer will result in a panic!. When the data transfer ends the DMA conceptually gives the Ref back to the CPU and the CPU destroys it -- under the hood Buffer.unlock is called. Now the CPU can immutably or mutably borrow the buffer again.

@Ericson2314

I think an ownership-based model can still work, with some global data structure would own the the buffer on the peripheral's behalf:

I can't see how this can be implemented efficiently. At first glance it seems that this would require memcpy-ing the buffer from the stack to the global/static variable (to give up ownership) and then memcpy-ing it back (to reclaim ownership).

The important bit here is that you may want to do something like this:

Context 1: start DMA transfer to fill some buffer B
Context 2: once the DMA transfer has finished make use of the buffer B

Context 1 and 2 may have different stacks (a context could be a thread or a task / interrupt) so I don't see how to avoid the memcpy to transfer ownership of the buffer B from 1 to 2.

The Buffer abstraction avoids this problem by keeping the data and its allocation in static (.bss / .data) memory and having no concept of ownership ("who's in charging of freeing the allocation?" - no one because this will never be destroyed), just a concept of read / write access (RefCell style -- see above diagram).

vitiral commented 7 years ago

@japaric why doesn't Buffer just implement Sync? It seems to me like it really should! It is kind of like a Cell but it has lock and unlock methods to ensure data safety (so it's more like a Mutex already). It seems like by definition it can be safely shared.

Ericson2314 commented 7 years ago

The lot would hold an owning pointer (or &'static mut, not too much less safely). The buffer itself just need be Send, and a plain array is fine. There is no memcopy.

Now getting such a pointer will also require some other abstraction, or plain unsafety, but is nicely decoupled from the DMA interface itself.

Ericson2314 commented 7 years ago

@japaric somewhat relatedly, I wish we could lable a function "call once" such that it has exclusive access to any statics declared within.

japaric commented 7 years ago

@vitiral Buffer methods are not interrupt safe and that's why the whole thing is not Sync. Making it Sync implies adding the synchronization mechanism into its implementation but there are several options for synchronization to pick from: RTFM's Resource, cortex-m's Mutex, RTFM's/cortex-m's Local, etc. I don't think we should hardcode a synchronization mechanism into Buffer. As proposed in this issue description you can use it with Resource, Mutex or Local.

@Ericson2314

The lot would hold an owning pointer

OK. I think this is getting too into the hypothetical territory. We want to build a HAL that we can use today so it can't depend or wait for features with no known timeline -- much less on features with unknown likelihood of ever getting added to the language.

but is nicely decoupled from the DMA interface itself.

Sorry, do you have concrete example / code snippet? I don't see how would this work.

First issue I see is that the DMA is not a general purpose processor and it can't run arbitrary code so you can't literally send a owning pointer to the DMA and then have it back when the DMA transfer is over. The DMA has no memory to stash the buffer / pointer. What has to happen is that the CPU must poll the DMA to see if the DMA transfer finished, then the CPU can mark a buffer as no longer owned by the DMA.

Secondly I don't see how a memcpy is not required in the example I gave where the DMA transfer is split in two contexts: one that allocates the buffer and starts the transfer on that buffer, and a second context that marks the transfer as done and uses the buffer. The first context may be deallocated when the second context executes so a memcpy is required for memory safety AFAICT.

I wish we could lable a function "call once" such that it has exclusive access to any statics declared within.

That sounds effect system-y (*) and viral: a function must be "call once" to call another "call once" function. Or perhaps it would be unsafe to call a "call once" function from a function that's not "call once" or not known to be "call once".

Anyhow, "call once"-ness sounds like it's more strict than necessary. We have Local abstraction that makes it safe to mutably access a static by constraining it to only be accesible from a single execution context (e.g. a single interrupt / task).

(*) Another feature that doesn't exist today and with unknown timeline.

Ericson2314 commented 7 years ago

@japaric I think what I am proposing is less exotic and different from yours than it sounds? I do like to think about features we don't have, but that's not the core of it.

First of all, in your pub fn write_all<'a>, when can 'a not be 'static? I want there to be such a situation, but I do not know of one. My proposal doesn't support that.

[For sake of showing the whole interaction, I make this method do triple duty: it sets the handler on completion, and the hardware reads the buffer and then writes information back before interrupting. In practice things would probably be broken apart, but then I'd need to write more functions to convey my point :).]

static Handler: RacyCell<Option<(
    unsafe fn(usize, usize),
    usize,
    usize,
)>> = Local::new(None);

pub fn read_and_write_all<RR, RW, B, F>(
    &self,
    read_buffer: R,
    write_buffer: R,
    new_handler: F
) -> Result<(), Error>
    where RR: Deref<B>    + 'static,
          RW: DerefMut<B> + 'static, // Deref*Mut* for now, 'cause it's just bytes
          B:  Unsize<[u8]> + ?Sized,
          F:  fn(RW, R)
{
    // Ensure ABI of handler

    // With associated constants maybe we can do this statically!
    debug_assert_eq!(size_of::<RW>, size_of::<usize>);
    debug_assert_eq!(size_of::<RR>, size_of::<usize>);

    // There's a transfer in progress
    if self.dma1.ccrw4.read().en().bit_is_set() {
        return Err(Error::InUse)
    }

    debug_assert!(Handler.get() == None);

    // number of bytes to read
    self.dma1.cndtr4.write(|w| w.ndt().bits(read_buffer.len() as u16));
    // read address
    self.dma1.cmar4.write(|w| w.bits(read_buffer.as_ptr() as u32));

    // number of bytes to write
    self.dma1.cndtw4.write(|w| w.ndt().bits(write_buffer.len() as u16));
    // write address
    self.dma1.cmaw4.write(|w| w.bits(write_buffer.as_ptr() as u32));

    // Store handler and buffer pointers for later.
    //
    // This is really just a lousy closure but we
    // pretend that we actually moved the buffer pointers to the peripheral and
    // got them back with the interrupt.
    *Handler.get() = Some((
        new_handler as unsafe fn(usize, usize),
        transmute(write_buffer),
        transmute(read_buffer),
    ));

    // start transfer
    dma1.ccrw4.modify(|_, w| w.en().set());

    Ok(())
}

Your Local abstraction is great, and in fact a perfect example of the utility of the polymorphism for the pointer type. In the cross-context case, one would call this with guards to a lock/cell, in the same-context case, one could just use Local.

japaric commented 7 years ago

@Ericson2314 Thanks for your example.

Your implementation looks OK (*) but it doesn't inter-operate with the existing abstractions that allow memory safe access to static variables.

First of all, in your pub fn write_all<'a>, when can 'a not be 'static?

Neither of the existing abstractions (Local / Mutex / Resource) return a &'static [mut] reference to the inner data as that would be unsound. Both Mutex and Resource use critical sections for memory safety and a 'static lifetime would let the reference escape the critical section. See example above.

Local doesn't return a &'static mut reference either as that would let you "safely" mutably alias memory which is not sound. See below:

// (This function can't be called by the processor because EXTI0 has no
//  constructor. The hardware (the NVIC) is what will call this function and
//  will never call `interrupt_handler` while an `interrupt_handler` is being
//  currently executed)
fn interrupt_handler(mut task: EXTI0) {
    static FOO: Local<Thing, EXTI0> = Local::new(..);

    // NOTE this doesn't freeze `task` because there's no "connection" between
    // the lifetime of the input `&mut task` and the output
    let foo: &'static mut Thing = FOO.borrow_mut(&mut task);

    // Two `&mut -` references to the same data
    let alias_foo: &'static mut Thing = FOO.borrow_mut(&mut task);
}

What abstraction do you propose to use with this API to let users write zero unsafe programs?

(*) I'm personally wary of putting static variables in functions as that makes them non re-entrant by definition. Doubly so if unsafe is required to access the variable. Triply so if the function is generic: each concrete instance will access the same static variable. But this particular implementation looks sound.

@adamgreig

I was revisiting the circular buffer abstraction and I think I have come up with something fully safe while keeping the circular behavior automatic. Below is a simplified version; you can see the full implementation in this PR.

(I'm going to write this using "const generics" (const N: usize as a type parameter) to avoid weird workarounds and hopefully make this clearer.)

struct CircBuffer<const N: usize, T>
where
    T: AtomicRead,
{
    buffer: [T; 2 * N],
    status: Cell<Status>,
}

unsafe trait AtomicRead {}
unsafe impl AtomicRead for u8 {}
unsafe impl AtomicRead for u16 {}
unsafe impl AtomicRead for u32 {}

enum Status {
    /// Not in use by the DMA
    Free,
    /// The DMA is mutating the first half of the buffer
    MutatingFirstHalf,
    /// The DMA is mutating the second half of the buffer
    MutatingSecondHalf,
}

/// Some analog pin
struct PA0;

impl PA0 {
    /// Start conversion store the values in `buffer`
    fn start<const N: usize>(&self, buffer: Ref<CircBuffer<N, u16>>) -> Result<(), Error> {
        buffer.lock(); // buffer.status = MutatingFirstHalf
        // Start circular DMA transfer with
        // addr = buffer.buffer.as_ptr() and len = 2 * N
    }
}

impl CircBuffer<const N: usize, T> {
    fn read(&self) -> nb::Result<&[RO<T>], Error> {
        let status = self.status.get();

        assert_ne!(status, Status::Free); // transfer not started!

        let isr = DMA1.isr.read();

        if isr.teif1().is_set() {
            Err(nb::Error::Other(Error::Transfer)) // transfer error
        } else {
            match status {
                 Status::MutatingFirstHalf => {
                     if isr.tcif1().is_set() {
                         // we didn't call `read` timely and the DMA is already
                         // mutating the first half again!
                         Err(nb::Error::Other(Error::Overrun))
                     } else if isr.htif1().is_set() {
                         // DMA done with the first half

                         // (clear flag, etc.)

                         let buffer: &[T] = unsafe { &(*self.buffer.get())[..N] };

                         // some magic here to transform &[T] into &[RO<T>]
                         // ..
                     } else {
                         // transfer not done
                         Err(nb::Error::WouldBlock)
                     }
                 },
                 Status::MutatingSecondHalf => {
                     if isr.htif1().is_set() {
                         Err(nb::Error::Other(Error::Overrun))
                     } else if isr.tcif1().is_set() {
                         let buffer: &[T] = unsafe { &(*self.buffer.get())[N..] };
                         // ..
                     } else {
                         Err(nb::Error::WouldBlock)
                     }
                 },
                 Status::Free => unreachable!(), // see `assert!` above
            }
        }
    }
}

// example
let buffer: Ref<CircBuffer> = ..;
PA0.start(buffer);

{
    // blocks until first half of the transfer is completed
    let first_half: &[RO<u16>] = block!(buffer.read());

    let x = first_half[0].read();
    let y = first_half[1].read();
    // ..
}

{
    // blocks until the second half of the transfer is completed
    let second_half: &[RO<u16>] = block!(buffer.read());

    let x = second_half[0].read();
    let y = second_half[1].read();
    // ..
}

Hopefully that's relatively straightforward except for the Atomic trait and the RO newtype. Those two are there to solve the "at some point the DMA will be mutating the half you are also mutating or reading" problem I mentioned before.

The solution I chose is to only ever have read only access to the circular buffer data. The "but there's one process mutating the data and another one reading it in parallel!" objection is solved by relying on the fact that u16 data can be read atomically so reads will never observe inconsistent data (which would be the case if e.g. u64 or a repr(u64) enum was being read).

The AtomicRead bound enforces that the buffer data can only be u8, u16 or u32. These three types can be read atomically (single instruction) on Cortex-M processors. The volatile_register::RO is there to force volatile reads of the data. This way the compiler can't assume that consecutively reading the same memory location will result in the same value. For example:

let half = block!(circ_buffer.read());

let x = half[0].read();
let y = half[0].read();

if x == y {
    // foo
} else {
    // the compiler *can't* optimize this branch away
    // this is what we want because the DMA may have mutated `half[0]` between
    // the two consecutive reads
}

There's a problem though: the selection of RO appears to disable memcpy promotion so something like this:

// goal memcpy `circ_buffer_half` into `other_buffer`

for (i, slot) in other_buffer.iter_mut().enumerate() {
    *slot = circ_buffer_half[i].read(); // RHS is a volatile read
}

doesn't lower to a memcpy(other_buffer.as_mut_ptr(), circ_buffer.as_ptr(), circ_buffer.len()) but instead produces a for loop that does byte-wise copying which probably is not that optimal. It's also annoying not being to able to write other_buffer.copy_from_slice(circ_buffer).

This problem could probably be fixed by returning an AtomicSlice newtype, instead of &[RO<T>] that has a copy_to_slice method that uses the intrinsics::volatile_copy_nonoverlapping_memory under the hood.

Ericson2314 commented 7 years ago

@japaric While dereferencing a lock guard gives us a borrowed pointer with a non-static lifetime, the lock guard itself should have no problem being 'static if the lock is in a static. RW: DerefMut<B> + 'static is satisfied.

With Local you have a good point, but I think there is a solution: we have a Local which contains a Option<&mut 'static Buffer> to the buffer. The caller pulls that out to use my interface. and the handler puts it back. [The handler and the caller both have the same ctx ZST to do this.] The &mut 'static Buffer simply needs to be the only way to access whatever it points to.

japaric commented 7 years ago

While dereferencing a lock guard gives us a borrowed pointer with a non-static lifetime, the lock guard itself should have no problem being 'static if the lock is in a static.

Right. cortex_m::Mutex doesn't use a lock guard but a closure. It could use a lock guard though (I don't really like drop-based guards because of mem::forget). OTOH rtfm::Resource can't use guards at all -- resources also use closures to grant access to the protected data -- because critical sections in RTFM must be nested in a FIFO manner for soundness and you can't enforce that with lock guards.

we have a Local which contains a Option<&mut 'static Buffer> to the buffer.

Hmm, how would the user safely put the &'static mut in the Local in the first place?

adamgreig commented 7 years ago

Instead of saying the buffer items must be sizes that can be read atomically, I think it makes more sense to require that they are sizes the DMA can handle? The STM32 DMA can read/write memory in sizes of u8, u16, and u32, and will do those atomically, but can't transfer any other sizes. Additionally it can be different between peripheral and memory, i.e. you can read a u32 from a peripheral register and write it as four consecutive u8 writes to memory. So it might make more sense to impl some DMA newtype for the sizes the device DMA operates on.

The idea that this then prevents inconsistent values I'm less sure about - e.g. if I let half = block!(circ_buffer.read()); and then take my time dealing with members of half, I don't think anything prevents the DMA from starting to overwrite them? Sure, the integers you see were all samples once (not two halves of separate samples), but they're not from the same sequence, so it's still a race condition that I thought shouldn't be achievable in safe code. You read half[20] and get the 20th sample from the nth sequence, and then read half[21] and could get the 21st sample of the (n+1)th sequence. That we would panic later when circ_buffer.read() is called the second time doesn't help.

As an aside, this pattern only covers the peripheral-to-memory case (eg ADC) but would need extending for memory-to-peripheral (DAC etc). Perhaps the enum variants should just be Free/FirstHalf/SecondHalf without "mutating" for the cases where the DMA is reading? And we'd need a write method to mirror read. I wonder if that means we should use Atomic::RW rather than RO. Even in the ADC case it might often be useful to be able to mutate the half-buffer in place, e.g. to apply a filter in-place before subsampling to a new buffer.

As a more minor aside, I think N should be the total buffer length, not the length of a half, but that's a bit bikesheddy and not that big a deal.

It's a shame we lose the lowering to memcpy. I think it'd be worth having the AtomicSlice or similar wrapper to allow easier/faster copying as it's probably a common use case for this buffer (eg copying it into a network buffer for transmission or something).

Ericson2314 commented 7 years ago

I think you mean LIFO? I'm not sure there is a way to force LIFO across a continuation, so there might not be a way to do that safely regardless. Moreover, it is not clear to me that one is even allowed to aquire resources across a job boundary with RTFM/SRP—though before I read your linked papers I knew next to nothing about hard real to so take that with a grain of salt.

I'm not sure there is a safe way to get the &mut today, but I don't see why static Foo: Bar = init(&mut expr); shouldn't be safe if expr is send.

japaric commented 7 years ago

@adamgreig

OK. Whole new idea based on your last comment. Forget about the atomic reads and the RO stuff.

I have changed read to this:

impl CircBuffer<const N: usize, T> {
    fn read<F, R>(&self, f: F) -> nb::Result<R, Error>
    where
        F: FnOnce(&[T; N]) -> R,
    {
        let status = self.status.get();

        assert_ne!(status, Status::Free); // transfer not started!

        let isr = DMA1.isr.read();

        if isr.teif1().is_set() {
            Err(nb::Error::Other(Error::Transfer)) // transfer error
        } else {
            match status {
                Status::MutatingFirstHalf => {
                    if isr.tcif1().is_set() {
                        // we didn't call `read` timely and the DMA is already
                        // mutating the first half again!
                        Err(nb::Error::Other(Error::Overrun))
                    } else if isr.htif1().is_set() {
                        // DMA done with the first half

                        // (clear flag, etc.)

                        let half: &[T; N] =
                            unsafe { &(*self.buffer.get())[..N] };

                        let ret = f(half);

                        if isr.tcif1().is_set() {
                            // executing `f` took too long and the DMA is
                            // already mutating the first part again. This is
                            // an overrun error
                            Err(nb::Error::Other(Error::Overrun))
                        } else {
                            Ok(ret)
                        }
                    } else {
                        // transfer not done
                        Err(nb::Error::WouldBlock)
                    }
                }
                Status::MutatingSecondHalf => {
                    // counterpart of MutatingFirstHalf
                }
                Status::Free => unreachable!(), // see `assert!` above
            }
        }
    }
}

Now usage looks like this:

let buffer: Ref<CircBuffer> = ..;
PA0.start(buffer);

{
    // blocks until first half of the transfer is completed
    // then executes the closure on the first half of the buffer
    let average = block!(buffer.read(|half| {
        half.iter().cloned().sum::<u16>()
    })).unwrap();

    // ..
}

{
    // This copies the half DMA buffer into the stack
    // This does optimize into a memcpy (!)
    let second_half = block!(buffer.read(|half| *half)).unwrap();

    // ..
}

So what happens now is that once the transfer is done the closure passed to read will be executed. That closure operates on the first half of the buffer -- the half the DMA just finished modifying. The trick is that once we are done executing the closure we check the DMA status again. If the DMA says that it finished modifying the second half and thus has started modifying the first half again then we bail out with an Overrun error -- the DMA may have modified the data we were operating on. If the DMA says that it's still modifying the second half then no data race could have occurred so we just return the result of executing the closure in an Ok.

I think this is general enough that the closure could directly operate on &mut [T; N] and the same method could be used for the DAC case you mention. read should be renamed to something more sensible in that case.

@Ericson2314

Yeah, sorry I meant LIFO.

Resource is how you share data between two or more tasks (interrupts) in RTFM land.

adamgreig commented 7 years ago

That looks nice @japaric, good idea. I agree that it could work with an &mut and therefore be the same for reading as writing, though I'm not sure yet what a good name would be...

I still think the variant names should change from MutatingFirstHalf to UsingFirstHalf or something like that though, since in the read sense it won't be mutating.

Ericson2314 commented 7 years ago

@japaric So from the papers I got the impression that at the beginning and end of a run of a job, no resources were held: the corresponding release to every acquire occurs within the same job.

Thinking about it some more, it is probably safe to extend that model to allow acquiring a resource over a job boundary as one would for DMA, [Safe in terms of functional correctness, I'm not sure it doesn't adversely affect meeting deadlines.] The only requirement I see to continue avoid deadlocks is ensure if job A initiates a DMA with continuation job B, then job A can only begin if any resources potentially needed by either A or B are unused.

Also, despite the literature, I can't see why the LIFO requirement actually matters. Due to the shared stack, jobs will actually acquire resources in a LIFO manner regardless. That alone seems sufficient to prevent dead locks. The manipulation of priorities does become more complicated, but this is no problem and probably doesn't even have a run-time cost.

Anyways I think Resources are a red herring, no? Your API doesn't work with resources, right? So this isn't a relative disadvantage of mine no matter what, correct? I'd hope merely working with Local and various locks is alone an advantage.

japaric commented 7 years ago

@adamgreig

though I'm not sure yet what a good name would be...

access or get. I'm not too concerned with naming at this point.

I still think the variant names should change from MutatingFirstHalf to UsingFirstHalf or something like that though, since in the read sense it won't be mutating.

Yeah, that makes sense.

@Ericson2314

So from the papers I got the impression that at the beginning and end of a run of a job, no resources were held: the corresponding release to every acquire occurs within the same job.

No resource that the task will use can be in use by other tasks when it starts; otherwise it shouldn't have started in the first place. That's the basic gist.

Thinking about it some more, it is probably safe to extend that model to allow acquiring a resource over a job boundary as one would for DMA

Yes, you can model it as if a process (a sequence of tasks) withheld the resource for its whole duration.

The only requirement I see to continue avoid deadlocks is ensure if job A initiates a DMA with continuation job B, then job A can only begin if any resources potentially needed by either A or B are unused.

But the compiler doesn't know if this will be the case or not so you still need runtime checks to panic or raise an error if this situation (A happens again before B ends - that would indicate an scheduling problem) occurs. At least if you want to keep the API safe (no unsafe).

Due to the shared stack, jobs will actually acquire resources in a LIFO manner regardless.

You need to nest critical sections in a LIFO manner to access resources with different ceilings within a single task to keep the model sound. Using drop guards to unlock is not sound. See below:

static R3; // resource with ceiling = 3
static R2; // resource with ceiling = 2

// with priority 1
fn task1() {
    // starts with preemption threshold = 1

    let r1 = R1.claim_mut(); // raises preemption threshold to 3
    let r2 = R2.claim_mut(); // no change in preemption threshold
    ..;
    drop(r1);                // drops preemption threshold to 1
    ..;
                             // <~ preempted by task2
    ..;
    drop(r2);
}

// with priority 2
fn task2() {
    // BAD data race. task2 has a reference to R2 while task1 also has a
    // reference to it
    let r2 = R2.claim_mut();
}

preemption threshold = priority another task must have to be able to preempt the current task / scope

Things would have been fine if LIFO order was maintained:

fn task1() {
    // starts with preemption threshold = 1

    let r1 = R1.claim_mut(); // raises preemption threshold to 3
    let r2 = R2.claim_mut(); // no change in preemption threshold
    ..;
    drop(r2);
    ..;
                             // <~ task2 tries to preempt but it can't
    ..;
    drop(r1);                // drops preemption threshold to 1
    // task2 preempts task1 here
    ..;
}

No data race in this case. But it's better to use closure instead of drop guards to make the first situation impossible:

fn task1() {
    // starts with preemption threshold = 1

    R1.claim_mut(|r1| { // raises preemption threshold to 3
        ..;
        R2.claim_mut(|r2| { // no change in preemption threshold
            // <~ task2 tries to preempt but it can't
        }); // no change in preemption threshold
    }); // drops preemption threshold to 1

    // task2 preempts task1 here

    ..;
}

Your API doesn't work with resources, right?

It was designed to work with tasks and resources.

So this isn't a relative disadvantage of mine no matter what, correct? I'd hope merely working with Local and various locks is alone an advantage.

With Local you can't move the buffer around across tasks because the data is pinned to a single task. The 'static requirement can't be met by task local data so with your API tasks can't start a DMA transfer.

In the upcoming version of RTFM the only place where &'static mut Local safely occurs is in the idle loop, which is like a never ending main. It's safe there because idle is not a recurring function; it gets called once at the beginning of a program and runs forever at the lowest priority. I'm not quite sure what the implications of having a &'static mut reference are since I have never dealt with them before (they may end up being unsound in the context of RTFM). With your API a transfer could be started in idle but I'm not quite sure what would happen next.

Ericson2314 commented 7 years ago

It was designed to work with tasks and resources.

I mean that your Buffer is not a SRP resource; it requires dynamic checks to safely access. It does work with RTFM, but so could mine, as one could not get a &'static mut from a resource.

But the compiler doesn't know if this will be the case or not so you still need runtime checks to panic or raise an error if this situation

The callback can take an unconstructable ZST that the caller also provides----morally handing off the capability.

drop(r1); // drops preemption threshold to 1

I think this is the problem, not the lack of FIFO itself. We have something very similar to a an immutable borrow going where as long as at least one resource with preemption ceiling N is held, the threshold must be at least high. [Now with a single threshhold this isn't terribly interesting other than for making a safe interface with guards---one could just make their borrows FIFO and have the same behavior as only the threshold matters. But if the implementation uses interrupt masks instead, we need not have totally ordered prevention levels: set inclusion induces a partial order, and masks traces like {Job0} -> {Job0, Job1} -> {Job1} are perfectly fine. This improves utilization too.]

While in principle Rust has the power to type check this statically with the borrow checker, that is not exposed to the user so we'd need to actually maintain counters the guards can decrement. Oh well; maybe someday. The counters would be guaranteed never to go below 0, however.

With Local you can't move the buffer around across tasks because the data is pinned to a single task.

I realize now local was never a good fit because we need to mask the initiator task while the peripheral and then callback owns the buffer. But with the lock-guard-and-counter approach above, I think we do have ourselves a safe and ergonomic interface.

nikhilkalige commented 6 years ago

It would be really awesome if somebody could come up with some api's for DMA that we could add to the hal library.. Atleast I am struggling to get a good design :)

japaric commented 6 years ago

@nikhilkalige I plan to revisit this after RFC japaric/cortex-m-rtfm#59 (safe &'static mut references) is implemented since it changes what's possible to do in safe code. In the meantine, you can look at the example DMA API in that RFC and at my current DMA experiments in this branch; maybe try to roll your own implementation based on that one? Then you can tell me what you think :-).

At this point I'm unsure what (traits? structs?) should eventually land in embedded-hal to promote a common DMA API across device crates.

japaric commented 6 years ago

I'm going to close this RFC since there's a simpler approach to DMA in the works. The new approach will get its own RFC posted in this issue tracker.

jonas-schievink commented 5 years ago

For reference, said simpler approach is discussed in https://github.com/rust-embedded/embedded-hal/issues/37