Closed japaric closed 6 years ago
cc @whitequark @adamgreig @vitiral @Kixunil
Read or write access sounds like &- and &mut- references but the proposed API won't use those.
To elaborate why it doesn't: I think plain references are not enough for memory
safety. Let me reimplement Serial.write_all
using references and #[async]
(you can translate this to futures by writing more boilerplate)
impl Serial {
// this is a mixture of the original Serial.write_all and Buffer.release
#[async]
fn write_all(&self, buffer: &[u8]) -> Result<(), Error> {
let usart1 = self.0;
let dma1 = self.1;
// There's a transfer in progress
if dma1.ccr4.read().en().bit_is_set() {
return Err(Error::InUse)
}
let buffer: &[u8] = buffer.lock().as_ref();
// number of bytes to write
dma1.cndtr4.write(|w| w.ndt().bits(buffer.len() as u16));
// from address
dma1.cmar4.write(|w| w.bits(buffer.as_ptr() as u32));
// to address
dma1.cpar4.write(|w| w.bits(&usart1.dr as *const _ as u32));
// start transfer
dma1.ccr4.modify(|_, w| w.en().set());
loop {
if dma1.isr.read().teif4().is_set() {
return Err(Error::Transfer)
} else if dma1.isr.read().tcif4().is_set() {
if status == Status::Locked {
unsafe { self.unlock() }
} else if status == Status::MutLocked {
unsafe { self.unlock_mut() }
}
// clear flag
dma1.ifcr.write(|w| w.ctcif5().set());
return Ok(())
} else {
// transfer not over
yield
}
}
}
}
Here's an example of using the above API to cause memory unsafety without using
the unsafe
keyword.
fn main() {
foo();
bar();
}
fn foo() {
let mut buffer = [0; 256];
let gen = SERIAL.read_exact(&mut buffer);
// the right thing to do is: drive the transfer to completion
// loop { gen.resume() }
// but nothing forces me to do that so I can return here
// deallocating `buffer` while the DMA transfer is ongoing
}
fn bar() {
// the DMA transfer continues and corrupts this stack frame
let x = 0u32;
let y = 0u32;
}
I might be missing something, but what is the point of the lock*
methods? They seem like a really unsafe API (if you forget to unlock you have deadlock!).
Conversely, the Buffer
-> Ref
api looks pretty much exactly like a Mutex
-> MutexGuard
to me except that you can have multiple Ref
s. Can you explain the differences? I'm wondering if having multiple Ref
s is really valuable for us.
Overall this seems really nice though I'd strongly prefer just &
/&mut
if we can find a way to make it work. It would be easier to use and very flexible. Opportunity for accidental memory unsafety seems very high.
Although not explicitly bounded B can only be an array of bytes ([u8; N]).
Why? A lot of the time (SPI, ADC, DAC, etc) the DMA transfers u16 or u32, and ndtr
refers to the number of items to transfer rather than the number of bytes.
Should we extend this to also work with Buffers allocated on the stack?
I think this is less common than static buffers but not entirely uncommon; you might well want to allocate something on the stack because you're only going to need it inside this scope, but then still want to use it with DMA before you're done with this function. I think this is much more true of multithreading where you have a stack for each thread/coroutine and so the buffer would live in that thread's stack. On a separate note, what about heap allocated Buffers for systems using dynamic allocation?
I think plain references are not enough for memory safety.
I'm aware "you can't rely on Drop
happening" but I wonder if we could have the DMA stop the stream on drop, and then you could perhaps allow &
/&mut
directly?
This API doesn't support using a single Buffer with more than one DMA channel (neither serially or concurrently).
I've never needed to, but consider most DMA peripherals treat TX and RX as separate streams: you might well want to DMA from a UART, modify the data somehow, then DMA it back out. Or DMA from an ADC, modify the buffer in-place, then replay it out the DAC. That would all require using the Buffer in different channels serially. Streaming the same data out of two independent DACs would require concurrent access (though on the STM32 specifically this isn't an issue as you can use the DACs together by writing a single register). It would be a shame for this use case to be totally impossible...
In the blue-pill implementation Buffer is defined in the blue-pill crate itself but to turn DMA based APIs into traits we would have to move that Buffer abstraction into the embedded-hal crate.
Can we have a DMAChannel
trait in embedded-hal
and then devices implement this against their specific DMA peripherals, for each channel?
@vitiral
I might be missing something, but what is the point of the lock* methods?
They are implementation details (non user facing APIs) used to implement the safe user facing API (Serial.write_all / Buffer.release)
if you forget to unlock you have deadlock!
You can't deadlock at least not in the Mutex sense. If you try to access a Buffer
that has not been released then you get a panic!. If you forget to do Buffer.release
when the transfer finished you would also run in this situtation; it's not inherent to the lock / unlock API.
Conversely, the Buffer -> Ref api looks pretty much exactly like a Mutex -> MutexGuard to me except that you can have multiple Refs.
Actually it's modelled exactly the same as a RefCell's Ref. But if you want to use the Mutex analogy then it would be equivalent to a RwLock.
Can you explain the differences? I'm wondering if having multiple Refs is really valuable for us.
You can read the buffer while it's also being read by the DMA in parallel. Not that I can give you an example about why that's useful off the top of my head. But it's a memory safe possibility and I don't see a reason to not allow it; someone may find it useful.
If the overhead of the reference count worries you we can add a Buffer
variant that only hands out a single RefMut
(&mut-
) and uses a bool, instead of an usize, to keep track of the count.
@adamgreig
Although not explicitly bounded B can only be an array of bytes ([u8; N]).
Why?
I actually meant to type "array of bytes (e.g. [u8; N]
)" meaning that B
should have type [T; N]
but you can't bound the type like that in today's Rust. At the end it's up to the DMA based API to pick what B
will be: [u8; N]
, [u16; N]
, etc.
I think this is much more true of multithreading where you have a stack for each thread/coroutine and so the buffer would live in that thread's stack.
You mean never-ending threads? I have mentioned before that stuff allocated in never-ending functions (fn() -> !
) should have a different lifetime to indicate that they can't be deallocated. That would be useful here but that feature doesn't exist.
On a separate note, what about heap allocated Buffers for systems using dynamic allocation?
At first glance it seems that it would have the same problem at stack allocated arrays as in "there's no way to enforce that the boxed slice won't be deallocated midway the DMA transfer". I suppose the API could be tweaked to hand over the ownership of the buffer to the DMA (?) during the transfer and then have some DMA.release
method to get the buffer (Box<[T]>
) back. That raises the question where the Box<[T]>
is stored during the transfer ...
I wonder if we could have the DMA stop the stream on drop, and then you could perhaps allow &/&mut directly?
You can't rely on drop happening :-). You can safely mem::forget
the droppable thing that would have stopped the DMA and have Bad Stuff happen without using the unsafe
keyword.
A proper solution to this "you can't rely on drop" stuff is adding linear types (I think that's what they are called) to Rust. A linear type means "this value MUST be dropped / consumed in this scope (unless you return it from the scope)". Rust has affine types: "this value should be dropped / consumed in this scope but it's fine if it's not (mem::forget)".
I've never needed to, but consider ...
OK but if we add this kind of feature we would have to lose some static typing. We would probably have to store in the buffer (i.e. at runtime) which channel is it being used with. Which means an enum of DMA channels rather encoding the channel as a phantom type. Seems doable but needs prototyping.
Can we have a DMAChannel trait in embedded-hal and then devices implement this against their specific DMA peripherals, for each channel?
Possibly. It will probably involve some ad-hoc traits that are implementation details and that are not supposed to be used by the end user.
Although not explicitly bounded B can only be an array of bytes ([u8; N]).
You can use an AsMut<[u8]>
bound here I think.
But this probably a bad idea since panicking destructor is equal to abort (?)
It is possible to have unwinding on larger microcontrollers (I know someone who did it on the less modestly sized STM32s) but it seems like an idea bad enough that I'm not sure if we ever want to support it...
A linear type means "this value MUST be dropped / consumed in this scope (unless you return it from the scope)". Rust has affine types: "this value should be dropped / consumed in this scope but it's fine if it's not (mem::forget)".
This doesn't work. An Rc
loop is constructible safely and is functionally equivalent to mem::forget
. This has been discussed to death back when mem::forget
was stabilized; the signature of mem::forget
is exactly identical to any other function that consumes a value and the function itself is equivalent to "put the object somewhere inaccessible to the rest of the code".
You can use an AsMut<[u8]> bound here I think.
Yeah that would work but it's limited to [T; 32]. This really wants to be struct Buffer<const N: usize, T, CHANNEL> { data: [T; N], .. }
though. Hopefully that's not too far off in the future.
But this probably a bad idea since panicking destructor is equal to abort (?)
I was actually thinking of double / nested panics when unwinding would call a destructor that panics (IDK if, today, that situation causes an abort in std-land). But ... since ARM micros default to panic = abort that situation can't occur because the first panic will cause an abort.
This doesn't work.
Right. So, @adamgreig, we can't rely on destructors to preserve memory safety due to mem::forget.
How bad would it be to use closures to ensure "destructor" is run? (Like in case of scoped threads.)
It's pretty bad, but I guess you can always fall back to an unsafe or poisoning API.
@Kixunil that would be too limiting. With a closures based API you can't split the start / finish operations in two tasks / interrupts, or create a futures based API to select
two or more pending DMA transfers. It could be an alternative API if there's a need for it.
Ok, I've gotten more of a chance to look this over. First of all, I believe that standardized shared buffers are probably one of the most critical requirements for building a microcontroller ecosystem -- so thankyou for putting in the effort of crafting an API around it
I have written the library defrag-rs with the goal of allowing sharing and defragmentation of memory. I strongly advise the API presented there where memory pools give out Mutex
es to their data which track when the data is locked, when it has a reference to it (may be used in the future) and when it is no longer in use. We could then have ALL microcontroller memory managers use this API.
The basic idea is that you might have an ecosystem of memory "pool" libraries that keeps track of a whole bunch of different data buffers. You would call Pool.get()
to get a typed Mutex
(or RwLock
) which you could then pass on to any function (or you can get a reference and pass it on).
A few key points:
Buffer
. We can use mem::size_of::<T>()
to make sure that type T
fits inside of whatever underlying "buffer" exists. Buffers aren't special -- we may want to pass around arrays of other types than u8
or large structs (note that I'm guilty of this too with defrag::Slice
). For instance, I think functions could just accept Mutex<HeaplessVec<T, A>>
where A
is a generic array and HeaplessVec
is similar to (or is) heapless::Vec
static_ref::Ref
gives us anything. Buffer
(or Mutex
) must already be guaranteed to not get freed when the function returns Buffer
in order for the data to stay reserved... which you have to do anyway to return the result!)RwLock
I think RwLock
is dangerous because you might have too many functions holding read-only references and it would be hard to ever get access to writing. In most cases, you should be able to lock
the data and then pass out references if you want multiple readers. In other cases, lock
should be sufficient.
Also, defrag
manages to implement the lock using a single bit -- other memory management libraries can probably do the same. Doing this with RwLock
would be completely impossible -- at minimum you would want to use at least one byte (for count of readers) and really you should use a usize
. This kind of bit stuffing can make these libraries truly universal.
The basic idea is that you might have an ecosystem of memory "pool" libraries that keeps track of a whole bunch of different data buffers.
I'm not particularly happy about this design. It seems to lock me into dynamic memory management of sorts.
@whitequark the application specific "pools" could be statically allocated buffers where you know all the (typed) indexes at compile time (and they would be checked at compile time) -- or they could be semi-dynamic where there is a set number of known-size buffers, or they could be from a library like defrag
where the memory allocation is truly dynamic.
We would have to make sure that ALL of these are supported -- but I think they can be 😄
I agree with @whitequark in that I wouldn't want the API here to be locked into
some dynamic memory management strategy. I do would like this API to also work
with dynamically allocated memory, but right now I can't see how to implement
such thing safely: it all boils down to solving the "you must not destroy this
buffer while the DMA is using it" problem which I have solved here with
static_ref::Ref
which effectively means "the buffer can't never be destroyed".
The only other solution that I can think of are panicking destructors which are
rather bad.
Code-wise Buffer
can be extended to support fixed size arrays, boxed arrays
and slices by turning it into a DST: struct Buffer<.., B> { .., data: B }
where B
can be [T; N]
or [T]
allowing Box<Buffer<.., [T]>>
, but the
destructor problem I mentioned above remains.
Comments on RwLock
Like I said above we can have two variants: the current "RefCell
" style one
and a "RefMutCell
" one that tracks the current borrow using a bool and can
only hand out a single &mut-
reference at a time. I don't see a reason to not
provide the flexibility of allowing both.
@japaric if Mutex
was a trait then it would be tied to any kind of allocation scheme (any allocation library could implement it). And mutex.lock().unwrap()
would be the equivalent of Refcell.lock()
I really think the simplest way to think about this is we want types that work like data on the heap, but are actually statically allocated.
Also, I agree that data has to be backed statically. Maybe the type should be renamed StaticMutex
to make that clear
On a slightly different note, a common DMA use case that I don't think is covered here are circular buffers, where a single large buffer is read/written continuously in a loop. Typically you have a "half complete" interrupt as well as a "complete" interrupt, allowing you to perform processing in one half of the buffer while the DMA deals with the other half. I use this to stream ADC data into memory and then deal with half of it while the ADC continues reading into the other half; since the ADC runs at a particular sampling frequency I can't afford to stop/start the DMA or otherwise interrupt it. As far as I can tell the whole purpose of the Buffer
is to ensure you can't do this (write into one half while DMA is reading the other half, or vv).
A very similar use case is double buffered DMA, where the DMA is given two buffers and automatically swaps between them as each one is filled. This might be the easier scenario to handle: the DMA-using function would be given two separate buffers and uses the half-complete/complete interrupts to lock/unlock them in turn. Once you had that working you probably wouldn't need to support circular single buffers, as the user could just take their original buffer and split it into two consecutive buffers. There might be some devices that only support circular buffers and not double buffers: those could have the device-specific implementation require the two buffers passed in are consecutive in memory, then hand them to the DMA as a single circular buffer and manage locking/unlocking as usual.
I need to catch up on thread, but I always thought &move
and &out
would be good for this. Having the peripheral own the initialized or uninitialized arrays seems correct.
@vitiral I don't see how e.g. a memory pool being statically allocated helps here; as long as you have a dynamic memory allocation scheme you can deallocate the buffer the DMA is using and that would be Bad. The problem to solve here is how you prevent that from happening without relying on destructors running for memory safety.
@adamgreig Already on my TODO list. I was wondering if it's possible to support circular / doube buffering withouting creating a new Buffer
type for that kind of operation but it's probably easier if I create a CircBuffer
type with two back to back buffers and write some application first and then see if the types can be merged somehow. I was also wondering if there's an use case for circular buffers when there's no half-transfer interrupt / flag available. I don't know if there are chips like that; the STM chips support circular and provide a half-transfer flag but IDK if that's universal.
@Ericson2314 I don't immediately see how &move
/ &out
helps here; I thought that was meant to avoid initializing e.g. arrays and statically forbid the initialization routine from reading the uninitialized memory.
Having the peripheral own the initialized or uninitialized arrays seems correct.
That's one option but @adamgreig mentioned above the use case of wanting to use a single statically allocated buffer with several peripherals. With ownership that seems to me like it would get messy really quickly specially if your peripherals are in static
variables (you end with array types in their signature); transferring buffer ownership across peripherals would possibly involve Option
types under the hood and (panicky) Option.take
and then unwrap
operations.
I've opened #15 with my counter proposal.
@japaric I think the key you might be missing is that you have to pass the Mutex
around by ownership (M
not &M
). If you always are pasing ownership then it will never get dropped and then deallocated (and you CAN rely on drop mechanics). Methods then return the full Mutex
when they are done.
When you want to share a Mutex
with another thread, you have to use Arc
like in the example code for std::sync::Mutex
.
I'm pretty sure rust's lifetimes will prevent you from doing anything stupid here (unless you use unsafe
). Do you have a use case where this is not true? (I would be shocked, as this is the exact same API as rust's stdlib).
Also, it is really important to poing out that RwLock
is just a subset of Mutex
. If your function is going to write to the data then it might as well take a Mutex
.
If libraries defined a type that implements RwLock
those same types could also implement Mutex
. They would just have to treat Mutex::lock == RwLock::lock_mut
.
Therefore Mutex
is far more general, and you might as well accept Mutex
if you are going to be writing to the data anyway.
I've been thinking about this some more, and I think we are taking the fundamental wrong approach here. Where we are wrong is that data needs to be locked at all. I felt like this as the beginning because I had @japaric's RTFM blog post in the back of my head which guanatees, among other things, "deadlock free execution guaranteed at compile time."
Rust's std library gets away with avoiding a concurrency primitive (i.e. greenthreads) because:
I get the impression that this API is trying to be "general" like the stdlib -- for use with frameworks other than RTFM (like TockOS, or with no concurrency framework). I think this is a serious mistake. RTFM allows us to have lock and deadlock freedom with no performance overhead. For the embedded standard library of the future we need to be opinionated about our concurrency primitives in order to realize such massive gains.
For no_std
we still have the type system, but we do not have flexibility of the scheduler. Quite the oposite -- we have the rigidity of interrupt levels. We can use this rigidity to our advantage and make even more robust programs than a conventional approach would alow (whether asyncio/futures based or not!). RTFM is that approach.
In REQ-layers I have laid out what I think are the primary layers necessary for development. However, I am now thinking that the concurrency
layer needs to be in layer4: the HAL must depend on it.
In summary, the following layers should be based on RTFM, the other layers will be general and could be used in any embedded project:
application
(user programs will use all the layers)communication
, board_api
generic_protocols
(i.e I2C, CAN, TCP), external_devices
HAL
(this RFC), concurrency
(RTFM)Neither the Buffer
or a Mutex
makes any sense for peripherals. All peripherals will be interrupt driven through RTFM and will therefore be guaranteed to be "unlocked" at compile time -- no runtime checks are necessary. We should share data through the RTFM framework, and the data can be of any type that is valid for the peripheral (we should open a separate issue to discuss data types such as circular buffers, etc).
@vitiral
I think the key you might be missing is that you have to pass the Mutex around by ownership (M not &M). If you always are pasing ownership then it will never get dropped and then deallocated (and you CAN rely on drop mechanics). Methods then return the full Mutex when they are done.
I don't see how that would work except for having the method block for the duration of the whole DMA transfer, i.e. make the method blocking, which is the opposite of what you want to use the DMA for (you want a parallel memory transfer) in the first place.
I get the impression that this API is trying to be "general" like the stdlib -- for use with frameworks other than RTFM
Yes, that's an explicit design goal, cf. bullet 3: callback model = RTFM tasks.
I think this is a serious mistake.
I don't think it's a mistake. If we want to have code sharing to prevent every concurrency framework from rewriting the same register manipulation code and to have to deal with less bugs in the end then we have to be agnostic about the concurrency model in the lowest HAL layer which is this crate.
If you want a HAL tailored for RTFM tasks then you can build a hal-rtfm
crate
on top of this one.
RTFM allows us to have lock and deadlock freedom with no performance overhead.
Indeed it does that as far as concurrency and data race freedom are involved.
However, it's not a panacea: DMA transfers, being asynchronous, would require
scheduling analysis and whole program static analysis integrated into the compilation
process to have zero performance overhead (i.e. no runtime checks) but retaining
safety (lack of unsafe
) at the source code level.
I've tested this DMA based API with the RTFM framework and the runtime checks
and the static_ref::Ref
abstractions are required for memory safety. Perhaps
it's possible to come up with a better API with less checks -- I don't know. I
personally don't see better alternatives (modulo an unsafe API) at this point.
the HAL must depend on it (concurrency layer = tasks).
Note that RTFM tasks are not the only way to do multitasking in the RTFM
framework. You can use co-routines / futures in the idle loop to do cooperative
multitasking. So even if the RTFM framework ends up being the framework of
choice for concurrency we would still need a HAL that works with co-routines and
futures. Having a hal-futures
, hal-rtfm
and hal-coroutines
built on top
of this concurrency model agnostic crate would lead to less code duplication.
@whitequark
TIL (a few days ago actually) that you can use the Unsize
trait to abstract
over [T; N]
in a way that's not limited to N <= 32
. Example:
fn foo(array: &A) where A: Unsize<[u8]> {
let slice: &[u8] = array;
}
foo(&[0; 33]);
foo(&[0; 1024]);
The problem is that the Unsize
trait is an unstable feature of the core
library. Maybe not a problem if you are OK with making your library nightly
only.
@adamgreig
I was thinking about a non-stoppable circular DMA API and I can't see a way to make it safe unless ... ~~you tightly integrate with interrupt handlers which would be (a) very device specific and (b) probably not too flexible~~ nvm, I don't think that would work either. This seems like one of those problems where safety is tied to scheduling analysis ("make sure this code finishes before the next event occurs")
Anyhow the problem I see goes like this:
struct CircBuffer<A> {
first_half: A,
second_half: A,
// plus some other state to track which half is used by the DMA or borrowed
// by the main processor
}
// you start a circular DMA transfer that uses the CircBuffer
let circ_buffer = ..;
start_circular_dma(circ_buffer);
// now you borrow the second half while the DMA is using the first half
let half: &mut [u8] = circ_buffer.borrow_mut();
loop {
// do stuff with half, but never break from `loop`
}
The problem there is that, because the DMA transfer never stops, at some point
the DMA will be mutating the half
you are also mutating or reading. This goes
against Rust borrowing principles because two parallel processes now have
mutable access to the same piece of data.
The problem goes away (maybe?) if the DMA transfer is not "automatically" circular. In the sense that you still have two buffers and half and complete transfer events but you must manually restart a new DMA transfer on the same buffer. This would add some overhead to the circular operation mode but maybe it's not too bad that even hardcore application can get away with this manual restart?
@japaric
TIL (a few days ago actually) that you can use the Unsize trait to abstract over [T; N] in a way that's not limited to N <= 32. Example:
That's ironic; I knew about Unsize
but have not managed to make it work when experimenting.
The problem is that the Unsize trait is an unstable feature of the core library. Maybe not a problem if you are OK with making your library nightly only.
Assuming it's going to be stabilized that seems fine, since all embedded work is nightly-only anyway in foreseeable future.
This would add some overhead to the circular operation mode but maybe it's not too bad that even hardcore application can get away with this manual restart?
Hmm. Maybe. For my use cases I have a timer trigger the ADC, which takes a sample, and then the ADC fires a DMA request to move a word from the ADC register to memory. To avoid missing any samples you need to restart the DMA after the transfer complete interrupt, but before the next sample completes. At full sampling rate and full clock speed you have perhaps 70 cycles between samples, fewer once the DMA has actually fired an interrupt, which I think might just be doable but seems pretty tight to rely on.
At lower sample rates (you might only be sampling at a few 10s of ksps) you'd have loads of time, no problem. It becomes mildly annoying to have to restart the DMA each transfer, but it can be made nice by a nice API, and the unsafe hatch is still there to use if you'd rather. I'd still worry about potentially missing a sample because the API can't ensure that your DMA-restarting code actually runs in time, but at least with a suitable interrupt priority and so forth it should be possible.
At high sample rates you might just have to accept that it will be an unsafe thing and you have to be a little careful with your circular buffer? I agree with your analysis -- we can't statically enforce that the code using the reference to one half will complete before the DMA swaps halves. Ultimately if your code hits this problem it's because it already cannot keep up with the data rate, so you were sort of lost from the beginning anyway.
I'd still like to find some good way of doing circular/double buffers because it is a fairly common use case for datalogging and DSP and most things where you are running the ADCs at a serious rate, but it seems pretty tough. Perhaps the best we can do is an easy to use but unsafe API, with suitable documentation around it. It might also be possible to detect the failure condition after the fact in some cases -- when the user asks for the other half of the buffer but the DMA has already swapped to it for instance -- and throw a panic when that happens, just to aid development/debugging.
My feeling coming from C is that I'm not that unhappy about a few unsafe blocks here and there, especially if they can be relatively benign to handle. I'd be handling everything carefully in C, instead. But I worry this blinds me to what might otherwise be possible with Rust!
PS. :+1: for keeping embedded-hal
async API agnostic, especially with a view to potential scheduling or cooperative multitasking in the idle loop of rtfm.
@whitequark
This doesn't work. An Rc loop is constructible safely
We can add a new auto trait for non-mandatory destructors and make RC and mem::forget
use it.
@japaric
You mean never-ending threads? I have mentioned before that stuff allocated in never-ending functions (
fn() -> !
) should have a different lifetime to indicate that they can't be deallocated. That would be useful here but that feature doesn't exist.
Tails-calls down the road might somewhat complicate this, but hopefully there is a work around.
[In general low-level event-driven code is fundamentally written in a continuation-passing style, and Rust doesn't really deal well with that on many levels. I was hoping that with the focus on futures-rs
, the core time would have reason to worry about these things up the stack, but since that became more pull-based rather than push-based, they've dodged these issues and the receded back into obscurity, sigh.]
The DMA I've thought most about is high bandwidth NICs ("thought" as haven't written anything actually here, so please take with grain of salf!). The "buffer descriptor queue" model many use makes ownership rather than borrowing more natural--even if extra data needs to be tracked, it can be stored in an a "side ring buffer" I suppose (morally equivalent to ring buffer with bigger cells as they share head / tail state, but compatible with the NIC ABI).
I don't immediately see how &move / &out helps here; I thought that was meant to avoid initializing e.g. arrays and statically forbid the initialization routine from reading the uninitialized memory.
I suppose since u8
is plain old data, there's less benefit. But if we had a DerefMove
trait to go with &move
, then an API for this buffer descriptor DMA could use DerefMove<[u8]>
to avoid locking in the use of Box
while keeping ownership-based semantics.
@japaric I think I am beyond my capability of understanding things here. I must just not be wrapping my head around the DMA.
Some quick comments I think could help:
&'static Bufer
instead of static_ref::Ref<Buffer>
-- aren't they the same thing? (you seem to say that in that secition in the first post)don't see how that would work except for having the method block for the duration of the whole DMA transfer
Are you saying the method goes out of scope before it completes without being able to store its buffer somewhere? Who is owning the Ref<'a, Buffer<B, Dma1Channel4>>
in your Serial
example...? ... oh my god, I think I just got it.
Nobody keeps ownership on the Buffer
. You are defining these functions as if they were in a separate CPU because they bascially ARE in a separate CPU -- it's just that two CPU's are using the same program simultaniously. The only way to tell which channel is/will be used is Dma1Channel4
types.
I think I'm starting to get it... it is so much lower-level than memory management -- this is really low level stuff. Most of what I've been saying has been really not helpful -- sorry about that.
Even when dealing solely with DMA initiated by the CPU (as opposed the NIC case where the outside world sending packets triggers DMA), I think an ownership-based model can still work, with some global data structure would own the the buffer on the peripheral's behalf:
This relies on either one pending DMA request at a time, or some way to disambiguate between pending requests to key into the parking lot. I assume that won't be an onerous requirement as other stuff probably needs it anyways.
@vitiral
I'm having trouble understanding why you couldn't use &'static Bufer instead of static_ref::Ref
-- aren't they the same thing? (you seem to say that in that secition in the first post)
They are semantically the same thing once you have one in your hands. The
difference is how you get one or the other. To get a &'static Buffer
you need
a static B: Buffer = ..
but that can't be constructed because Buffer
is not
Sync
(not safely shareable by threads / interrupts / tasks) and static
variables must implement Sync
. So to build a statically allocated global
Buffer
you need something like static B: Mutex<Buffer>
. The thing is that
the &'a Buffer
you can get from B
can't never be a &'static Buffer
because
the &-
reference means that you have access to the Buffer
for the span of
code (region) 'a
. 'a == 'static
would mean that the reference can escape the
critical section for which the inner Buffer
can be accessed and that would be
unsound.
Example:
static B: Mutex<Buffer> = ..;
fn main() {
let buffer = cortex_m::interrupt::free(|cs| { // interrupts are disabled here
// let's assume that borrow returns `&'static Buffer`
// (the existing API returns `&'cs Buffer` where `'cs` is the lifetime of
// this critical section)
let buffer: &'static Buffer = B.borrow(cs);
buffer // now buffer can escape this critical section (because of `'static`)
}); // interrupts are re-enabled here
// BAD the buffer can be accessed outside the critical section; this is racy
buffer.borrow();
}
Are you saying the method goes out of scope before it completes without being able to store its buffer somewhere?
Visually this is more or less what happens:
(sorry for the crappy diagram)
// CPU DMA
let buffer = BUFFER.lock();
// ..
serial.write_all(buffer).unwrap(); // Ref1 ------------->
// Ref1
let n = buffer.borrow().len(); // Ref2 Ref1
// Ref1
block! { // Ref1
buffer // <------------- Ref1
.release() // drop(Ref1)
}.unwrap();
When the DMA transfer starts the CPU conceptually sends a buffer::Ref
(this
is a RefCell
style Ref
) to the DMA -- under the hood Buffer.lock
is
called, but there's no real Ref
transfer (the DMA can't "hold" arbitrary
data). While the DMA transfer is in progress it's as if the DMA was holding a
Ref
to the buffer so the CPU can only immutably borrow the buffer (RefCell
style borrow
) -- trying to mutably borrow the buffer will result in a
panic!
. When the data transfer ends the DMA conceptually gives the Ref
back to the CPU and the CPU destroys it -- under the hood Buffer.unlock
is
called. Now the CPU can immutably or mutably borrow the buffer again.
@Ericson2314
I think an ownership-based model can still work, with some global data structure would own the the buffer on the peripheral's behalf:
I can't see how this can be implemented efficiently. At first glance it seems that this would require memcpy-ing the buffer from the stack to the global/static variable (to give up ownership) and then memcpy-ing it back (to reclaim ownership).
The important bit here is that you may want to do something like this:
Context 1 and 2 may have different stacks (a context could be a thread or a task / interrupt) so I don't see how to avoid the memcpy to transfer ownership of the buffer B from 1 to 2.
The Buffer
abstraction avoids this problem by keeping the data and its
allocation in static (.bss / .data) memory and having no concept of ownership
("who's in charging of freeing the allocation?" - no one because this will never
be destroyed), just a concept of read / write access (RefCell
style -- see above
diagram).
@japaric why doesn't Buffer
just implement Sync
? It seems to me like it really should! It is kind of like a Cell
but it has lock
and unlock
methods to ensure data safety (so it's more like a Mutex
already). It seems like by definition it can be safely shared.
The lot would hold an owning pointer (or &'static mut
, not too much less safely). The buffer itself just need be Send, and a plain array is fine. There is no memcopy.
Now getting such a pointer will also require some other abstraction, or plain unsafety, but is nicely decoupled from the DMA interface itself.
@japaric somewhat relatedly, I wish we could lable a function "call once" such that it has exclusive access to any statics declared within.
@vitiral Buffer
methods are not interrupt safe and that's why the whole thing is not Sync
. Making it Sync
implies adding the synchronization mechanism into its implementation but there are several options for synchronization to pick from: RTFM's Resource
, cortex-m's Mutex
, RTFM's/cortex-m's Local
, etc. I don't think we should hardcode a synchronization mechanism into Buffer
. As proposed in this issue description you can use it with Resource
, Mutex
or Local
.
@Ericson2314
The lot would hold an owning pointer
OK. I think this is getting too into the hypothetical territory. We want to build a HAL that we can use today so it can't depend or wait for features with no known timeline -- much less on features with unknown likelihood of ever getting added to the language.
but is nicely decoupled from the DMA interface itself.
Sorry, do you have concrete example / code snippet? I don't see how would this work.
First issue I see is that the DMA is not a general purpose processor and it can't run arbitrary code so you can't literally send a owning pointer to the DMA and then have it back when the DMA transfer is over. The DMA has no memory to stash the buffer / pointer. What has to happen is that the CPU must poll the DMA to see if the DMA transfer finished, then the CPU can mark a buffer as no longer owned by the DMA.
Secondly I don't see how a memcpy is not required in the example I gave where the DMA transfer is split in two contexts: one that allocates the buffer and starts the transfer on that buffer, and a second context that marks the transfer as done and uses the buffer. The first context may be deallocated when the second context executes so a memcpy is required for memory safety AFAICT.
I wish we could lable a function "call once" such that it has exclusive access to any statics declared within.
That sounds effect system-y (*) and viral: a function must be "call once" to call another "call once" function. Or perhaps it would be unsafe
to call a "call once" function from a function that's not "call once" or not known to be "call once".
Anyhow, "call once"-ness sounds like it's more strict than necessary. We have Local
abstraction that makes it safe to mutably access a static by constraining it to only be accesible from a single execution context (e.g. a single interrupt / task).
(*) Another feature that doesn't exist today and with unknown timeline.
@japaric I think what I am proposing is less exotic and different from yours than it sounds? I do like to think about features we don't have, but that's not the core of it.
First of all, in your pub fn write_all<'a>
, when can 'a
not be 'static
? I want there to be such a situation, but I do not know of one. My proposal doesn't support that.
[For sake of showing the whole interaction, I make this method do triple duty: it sets the handler on completion, and the hardware reads the buffer and then writes information back before interrupting. In practice things would probably be broken apart, but then I'd need to write more functions to convey my point :).]
static Handler: RacyCell<Option<(
unsafe fn(usize, usize),
usize,
usize,
)>> = Local::new(None);
pub fn read_and_write_all<RR, RW, B, F>(
&self,
read_buffer: R,
write_buffer: R,
new_handler: F
) -> Result<(), Error>
where RR: Deref<B> + 'static,
RW: DerefMut<B> + 'static, // Deref*Mut* for now, 'cause it's just bytes
B: Unsize<[u8]> + ?Sized,
F: fn(RW, R)
{
// Ensure ABI of handler
// With associated constants maybe we can do this statically!
debug_assert_eq!(size_of::<RW>, size_of::<usize>);
debug_assert_eq!(size_of::<RR>, size_of::<usize>);
// There's a transfer in progress
if self.dma1.ccrw4.read().en().bit_is_set() {
return Err(Error::InUse)
}
debug_assert!(Handler.get() == None);
// number of bytes to read
self.dma1.cndtr4.write(|w| w.ndt().bits(read_buffer.len() as u16));
// read address
self.dma1.cmar4.write(|w| w.bits(read_buffer.as_ptr() as u32));
// number of bytes to write
self.dma1.cndtw4.write(|w| w.ndt().bits(write_buffer.len() as u16));
// write address
self.dma1.cmaw4.write(|w| w.bits(write_buffer.as_ptr() as u32));
// Store handler and buffer pointers for later.
//
// This is really just a lousy closure but we
// pretend that we actually moved the buffer pointers to the peripheral and
// got them back with the interrupt.
*Handler.get() = Some((
new_handler as unsafe fn(usize, usize),
transmute(write_buffer),
transmute(read_buffer),
));
// start transfer
dma1.ccrw4.modify(|_, w| w.en().set());
Ok(())
}
Your Local
abstraction is great, and in fact a perfect example of the utility of the polymorphism for the pointer type. In the cross-context case, one would call this with guards to a lock/cell, in the same-context case, one could just use Local
.
@Ericson2314 Thanks for your example.
Your implementation looks OK (*) but it doesn't inter-operate with the existing abstractions that allow memory safe access to static variables.
First of all, in your pub fn write_all<'a>, when can 'a not be 'static?
Neither of the existing abstractions (Local / Mutex / Resource) return a
&'static [mut]
reference to the inner data as that would be unsound. Both
Mutex and Resource use critical sections for memory safety and a 'static
lifetime would let the reference escape the critical section. See example above.
Local doesn't return a &'static mut
reference either as that would let you
"safely" mutably alias memory which is not sound. See below:
// (This function can't be called by the processor because EXTI0 has no
// constructor. The hardware (the NVIC) is what will call this function and
// will never call `interrupt_handler` while an `interrupt_handler` is being
// currently executed)
fn interrupt_handler(mut task: EXTI0) {
static FOO: Local<Thing, EXTI0> = Local::new(..);
// NOTE this doesn't freeze `task` because there's no "connection" between
// the lifetime of the input `&mut task` and the output
let foo: &'static mut Thing = FOO.borrow_mut(&mut task);
// Two `&mut -` references to the same data
let alias_foo: &'static mut Thing = FOO.borrow_mut(&mut task);
}
What abstraction do you propose to use with this API to let users write zero
unsafe
programs?
(*) I'm personally wary of putting static variables in functions as that makes them non re-entrant by definition. Doubly so if unsafe is required to access the variable. Triply so if the function is generic: each concrete instance will access the same static variable. But this particular implementation looks sound.
@adamgreig
I was revisiting the circular buffer abstraction and I think I have come up with something fully safe while keeping the circular behavior automatic. Below is a simplified version; you can see the full implementation in this PR.
(I'm going to write this using "const generics" (const N: usize
as a type
parameter) to avoid weird workarounds and hopefully make this clearer.)
struct CircBuffer<const N: usize, T>
where
T: AtomicRead,
{
buffer: [T; 2 * N],
status: Cell<Status>,
}
unsafe trait AtomicRead {}
unsafe impl AtomicRead for u8 {}
unsafe impl AtomicRead for u16 {}
unsafe impl AtomicRead for u32 {}
enum Status {
/// Not in use by the DMA
Free,
/// The DMA is mutating the first half of the buffer
MutatingFirstHalf,
/// The DMA is mutating the second half of the buffer
MutatingSecondHalf,
}
/// Some analog pin
struct PA0;
impl PA0 {
/// Start conversion store the values in `buffer`
fn start<const N: usize>(&self, buffer: Ref<CircBuffer<N, u16>>) -> Result<(), Error> {
buffer.lock(); // buffer.status = MutatingFirstHalf
// Start circular DMA transfer with
// addr = buffer.buffer.as_ptr() and len = 2 * N
}
}
impl CircBuffer<const N: usize, T> {
fn read(&self) -> nb::Result<&[RO<T>], Error> {
let status = self.status.get();
assert_ne!(status, Status::Free); // transfer not started!
let isr = DMA1.isr.read();
if isr.teif1().is_set() {
Err(nb::Error::Other(Error::Transfer)) // transfer error
} else {
match status {
Status::MutatingFirstHalf => {
if isr.tcif1().is_set() {
// we didn't call `read` timely and the DMA is already
// mutating the first half again!
Err(nb::Error::Other(Error::Overrun))
} else if isr.htif1().is_set() {
// DMA done with the first half
// (clear flag, etc.)
let buffer: &[T] = unsafe { &(*self.buffer.get())[..N] };
// some magic here to transform &[T] into &[RO<T>]
// ..
} else {
// transfer not done
Err(nb::Error::WouldBlock)
}
},
Status::MutatingSecondHalf => {
if isr.htif1().is_set() {
Err(nb::Error::Other(Error::Overrun))
} else if isr.tcif1().is_set() {
let buffer: &[T] = unsafe { &(*self.buffer.get())[N..] };
// ..
} else {
Err(nb::Error::WouldBlock)
}
},
Status::Free => unreachable!(), // see `assert!` above
}
}
}
}
// example
let buffer: Ref<CircBuffer> = ..;
PA0.start(buffer);
{
// blocks until first half of the transfer is completed
let first_half: &[RO<u16>] = block!(buffer.read());
let x = first_half[0].read();
let y = first_half[1].read();
// ..
}
{
// blocks until the second half of the transfer is completed
let second_half: &[RO<u16>] = block!(buffer.read());
let x = second_half[0].read();
let y = second_half[1].read();
// ..
}
Hopefully that's relatively straightforward except for the Atomic
trait and
the RO
newtype. Those two are there to solve the "at some point the DMA will
be mutating the half you are also mutating or reading" problem I mentioned
before.
The solution I chose is to only ever have read only access to the circular
buffer data. The "but there's one process mutating the data and another one
reading it in parallel!" objection is solved by relying on the fact that u16
data can be read atomically so reads will never observe inconsistent data (which
would be the case if e.g. u64
or a repr(u64) enum
was being read).
The AtomicRead
bound enforces that the buffer data can only be u8
, u16
or
u32
. These three types can be read atomically (single instruction) on Cortex-M
processors. The volatile_register::RO
is there to force volatile reads of
the data. This way the compiler can't assume that consecutively reading the same
memory location will result in the same value. For example:
let half = block!(circ_buffer.read());
let x = half[0].read();
let y = half[0].read();
if x == y {
// foo
} else {
// the compiler *can't* optimize this branch away
// this is what we want because the DMA may have mutated `half[0]` between
// the two consecutive reads
}
There's a problem though: the selection of RO
appears to disable memcpy
promotion
so something like this:
// goal memcpy `circ_buffer_half` into `other_buffer`
for (i, slot) in other_buffer.iter_mut().enumerate() {
*slot = circ_buffer_half[i].read(); // RHS is a volatile read
}
doesn't lower to a memcpy(other_buffer.as_mut_ptr(), circ_buffer.as_ptr(), circ_buffer.len())
but instead produces a for loop that does byte-wise copying
which probably is not that optimal. It's also annoying not being to able to
write other_buffer.copy_from_slice(circ_buffer)
.
This problem could probably be fixed by returning an AtomicSlice
newtype,
instead of &[RO<T>]
that has a copy_to_slice
method that uses the
intrinsics::volatile_copy_nonoverlapping_memory
under the hood.
@japaric While dereferencing a lock guard gives us a borrowed pointer with a non-static lifetime, the lock guard itself should have no problem being 'static
if the lock is in a static. RW: DerefMut<B> + 'static
is satisfied.
With Local
you have a good point, but I think there is a solution: we have a Local
which contains a Option<&mut 'static Buffer>
to the buffer. The caller pulls that out to use my interface. and the handler puts it back. [The handler and the caller both have the same ctx ZST to do this.] The &mut 'static Buffer
simply needs to be the only way to access whatever it points to.
While dereferencing a lock guard gives us a borrowed pointer with a non-static lifetime, the lock guard itself should have no problem being 'static if the lock is in a static.
Right. cortex_m::Mutex
doesn't use a lock guard but a closure. It could use a lock guard though (I don't really like drop-based guards because of mem::forget). OTOH rtfm::Resource
can't use guards at all -- resources also use closures to grant access to the protected data -- because critical sections in RTFM must be nested in a FIFO manner for soundness and you can't enforce that with lock guards.
we have a Local which contains a Option<&mut 'static Buffer> to the buffer.
Hmm, how would the user safely put the &'static mut
in the Local in the first place?
Instead of saying the buffer items must be sizes that can be read atomically, I think it makes more sense to require that they are sizes the DMA can handle? The STM32 DMA can read/write memory in sizes of u8, u16, and u32, and will do those atomically, but can't transfer any other sizes. Additionally it can be different between peripheral and memory, i.e. you can read a u32 from a peripheral register and write it as four consecutive u8 writes to memory. So it might make more sense to impl some DMA newtype for the sizes the device DMA operates on.
The idea that this then prevents inconsistent values I'm less sure about - e.g. if I let half = block!(circ_buffer.read());
and then take my time dealing with members of half, I don't think anything prevents the DMA from starting to overwrite them? Sure, the integers you see were all samples once (not two halves of separate samples), but they're not from the same sequence, so it's still a race condition that I thought shouldn't be achievable in safe code. You read half[20] and get the 20th sample from the nth sequence, and then read half[21] and could get the 21st sample of the (n+1)th sequence. That we would panic later when circ_buffer.read()
is called the second time doesn't help.
As an aside, this pattern only covers the peripheral-to-memory case (eg ADC) but would need extending for memory-to-peripheral (DAC etc). Perhaps the enum variants should just be Free/FirstHalf/SecondHalf without "mutating" for the cases where the DMA is reading? And we'd need a write
method to mirror read
. I wonder if that means we should use Atomic::RW rather than RO. Even in the ADC case it might often be useful to be able to mutate the half-buffer in place, e.g. to apply a filter in-place before subsampling to a new buffer.
As a more minor aside, I think N should be the total buffer length, not the length of a half, but that's a bit bikesheddy and not that big a deal.
It's a shame we lose the lowering to memcpy. I think it'd be worth having the AtomicSlice or similar wrapper to allow easier/faster copying as it's probably a common use case for this buffer (eg copying it into a network buffer for transmission or something).
I think you mean LIFO? I'm not sure there is a way to force LIFO across a continuation, so there might not be a way to do that safely regardless. Moreover, it is not clear to me that one is even allowed to aquire resources across a job boundary with RTFM/SRP—though before I read your linked papers I knew next to nothing about hard real to so take that with a grain of salt.
I'm not sure there is a safe way to get the &mut
today, but I don't see why static Foo: Bar = init(&mut expr);
shouldn't be safe if expr
is send.
@adamgreig
OK. Whole new idea based on your last comment. Forget about the atomic reads and the RO stuff.
I have changed read
to this:
impl CircBuffer<const N: usize, T> {
fn read<F, R>(&self, f: F) -> nb::Result<R, Error>
where
F: FnOnce(&[T; N]) -> R,
{
let status = self.status.get();
assert_ne!(status, Status::Free); // transfer not started!
let isr = DMA1.isr.read();
if isr.teif1().is_set() {
Err(nb::Error::Other(Error::Transfer)) // transfer error
} else {
match status {
Status::MutatingFirstHalf => {
if isr.tcif1().is_set() {
// we didn't call `read` timely and the DMA is already
// mutating the first half again!
Err(nb::Error::Other(Error::Overrun))
} else if isr.htif1().is_set() {
// DMA done with the first half
// (clear flag, etc.)
let half: &[T; N] =
unsafe { &(*self.buffer.get())[..N] };
let ret = f(half);
if isr.tcif1().is_set() {
// executing `f` took too long and the DMA is
// already mutating the first part again. This is
// an overrun error
Err(nb::Error::Other(Error::Overrun))
} else {
Ok(ret)
}
} else {
// transfer not done
Err(nb::Error::WouldBlock)
}
}
Status::MutatingSecondHalf => {
// counterpart of MutatingFirstHalf
}
Status::Free => unreachable!(), // see `assert!` above
}
}
}
}
Now usage looks like this:
let buffer: Ref<CircBuffer> = ..;
PA0.start(buffer);
{
// blocks until first half of the transfer is completed
// then executes the closure on the first half of the buffer
let average = block!(buffer.read(|half| {
half.iter().cloned().sum::<u16>()
})).unwrap();
// ..
}
{
// This copies the half DMA buffer into the stack
// This does optimize into a memcpy (!)
let second_half = block!(buffer.read(|half| *half)).unwrap();
// ..
}
So what happens now is that once the transfer is done the closure passed to
read
will be executed. That closure operates on the first half of the buffer
-- the half the DMA just finished modifying. The trick is that once we are done
executing the closure we check the DMA status again. If the DMA says that it
finished modifying the second half and thus has started modifying the first half
again then we bail out with an Overrun
error -- the DMA may have modified the
data we were operating on. If the DMA says that it's still modifying the second
half then no data race could have occurred so we just return the result of
executing the closure in an Ok
.
I think this is general enough that the closure could directly operate on &mut [T; N]
and the same method could be used for the DAC case you mention. read
should be renamed to something more sensible in that case.
@Ericson2314
Yeah, sorry I meant LIFO.
Resource
is how you share data between two or more tasks (interrupts) in RTFM
land.
That looks nice @japaric, good idea. I agree that it could work with an &mut and therefore be the same for reading as writing, though I'm not sure yet what a good name would be...
I still think the variant names should change from MutatingFirstHalf
to UsingFirstHalf
or something like that though, since in the read sense it won't be mutating.
@japaric So from the papers I got the impression that at the beginning and end of a run of a job, no resources were held: the corresponding release to every acquire occurs within the same job.
Thinking about it some more, it is probably safe to extend that model to allow acquiring a resource over a job boundary as one would for DMA, [Safe in terms of functional correctness, I'm not sure it doesn't adversely affect meeting deadlines.] The only requirement I see to continue avoid deadlocks is ensure if job A initiates a DMA with continuation job B, then job A can only begin if any resources potentially needed by either A or B are unused.
Also, despite the literature, I can't see why the LIFO requirement actually matters. Due to the shared stack, jobs will actually acquire resources in a LIFO manner regardless. That alone seems sufficient to prevent dead locks. The manipulation of priorities does become more complicated, but this is no problem and probably doesn't even have a run-time cost.
Anyways I think Resources
are a red herring, no? Your API doesn't work with resources, right? So this isn't a relative disadvantage of mine no matter what, correct? I'd hope merely working with Local
and various locks is alone an advantage.
@adamgreig
though I'm not sure yet what a good name would be...
access
or get
. I'm not too concerned with naming at this point.
I still think the variant names should change from MutatingFirstHalf to UsingFirstHalf or something like that though, since in the read sense it won't be mutating.
Yeah, that makes sense.
@Ericson2314
So from the papers I got the impression that at the beginning and end of a run of a job, no resources were held: the corresponding release to every acquire occurs within the same job.
No resource that the task will use can be in use by other tasks when it starts; otherwise it shouldn't have started in the first place. That's the basic gist.
Thinking about it some more, it is probably safe to extend that model to allow acquiring a resource over a job boundary as one would for DMA
Yes, you can model it as if a process (a sequence of tasks) withheld the resource for its whole duration.
The only requirement I see to continue avoid deadlocks is ensure if job A initiates a DMA with continuation job B, then job A can only begin if any resources potentially needed by either A or B are unused.
But the compiler doesn't know if this will be the case or not so you still need runtime checks to panic or raise an error if this situation (A happens again before B ends - that would indicate an scheduling problem) occurs. At least if you want to keep the API safe (no unsafe
).
Due to the shared stack, jobs will actually acquire resources in a LIFO manner regardless.
You need to nest critical sections in a LIFO manner to access resources with different ceilings within a single task to keep the model sound. Using drop guards to unlock is not sound. See below:
static R3; // resource with ceiling = 3
static R2; // resource with ceiling = 2
// with priority 1
fn task1() {
// starts with preemption threshold = 1
let r1 = R1.claim_mut(); // raises preemption threshold to 3
let r2 = R2.claim_mut(); // no change in preemption threshold
..;
drop(r1); // drops preemption threshold to 1
..;
// <~ preempted by task2
..;
drop(r2);
}
// with priority 2
fn task2() {
// BAD data race. task2 has a reference to R2 while task1 also has a
// reference to it
let r2 = R2.claim_mut();
}
preemption threshold = priority another task must have to be able to preempt the current task / scope
Things would have been fine if LIFO order was maintained:
fn task1() {
// starts with preemption threshold = 1
let r1 = R1.claim_mut(); // raises preemption threshold to 3
let r2 = R2.claim_mut(); // no change in preemption threshold
..;
drop(r2);
..;
// <~ task2 tries to preempt but it can't
..;
drop(r1); // drops preemption threshold to 1
// task2 preempts task1 here
..;
}
No data race in this case. But it's better to use closure instead of drop guards to make the first situation impossible:
fn task1() {
// starts with preemption threshold = 1
R1.claim_mut(|r1| { // raises preemption threshold to 3
..;
R2.claim_mut(|r2| { // no change in preemption threshold
// <~ task2 tries to preempt but it can't
}); // no change in preemption threshold
}); // drops preemption threshold to 1
// task2 preempts task1 here
..;
}
Your API doesn't work with resources, right?
It was designed to work with tasks and resources.
So this isn't a relative disadvantage of mine no matter what, correct? I'd hope merely working with Local and various locks is alone an advantage.
With Local you can't move the buffer around across tasks because the data is pinned to a single task. The 'static requirement can't be met by task local data so with your API tasks can't start a DMA transfer.
In the upcoming version of RTFM the only place where &'static mut Local safely occurs is in the idle
loop, which is like a never ending main. It's safe there because idle
is not a recurring function; it gets called once at the beginning of a program and runs forever at the lowest priority. I'm not quite sure what the implications of having a &'static mut reference are since I have never dealt with them before (they may end up being unsound in the context of RTFM). With your API a transfer could be started in idle
but I'm not quite sure what would happen next.
It was designed to work with tasks and resources.
I mean that your Buffer is not a SRP resource; it requires dynamic checks to safely access. It does work with RTFM, but so could mine, as one could not get a &'static mut
from a resource.
But the compiler doesn't know if this will be the case or not so you still need runtime checks to panic or raise an error if this situation
The callback can take an unconstructable ZST that the caller also provides----morally handing off the capability.
drop(r1); // drops preemption threshold to 1
I think this is the problem, not the lack of FIFO itself. We have something very similar to a an immutable borrow going where as long as at least one resource with preemption ceiling N is held, the threshold must be at least high. [Now with a single threshhold this isn't terribly interesting other than for making a safe interface with guards---one could just make their borrows FIFO and have the same behavior as only the threshold matters. But if the implementation uses interrupt masks instead, we need not have totally ordered prevention levels: set inclusion induces a partial order, and masks traces like {Job0} -> {Job0, Job1} -> {Job1}
are perfectly fine. This improves utilization too.]
While in principle Rust has the power to type check this statically with the borrow checker, that is not exposed to the user so we'd need to actually maintain counters the guards can decrement. Oh well; maybe someday. The counters would be guaranteed never to go below 0, however.
With Local you can't move the buffer around across tasks because the data is pinned to a single task.
I realize now local was never a good fit because we need to mask the initiator task while the peripheral and then callback owns the buffer. But with the lock-guard-and-counter approach above, I think we do have ourselves a safe and ergonomic interface.
It would be really awesome if somebody could come up with some api's for DMA that we could add to the hal library.. Atleast I am struggling to get a good design :)
@nikhilkalige I plan to revisit this after RFC japaric/cortex-m-rtfm#59 (safe &'static mut
references) is implemented since it changes what's possible to do in safe code. In the meantine, you can look at the example DMA API in that RFC and at my current DMA experiments in this branch; maybe try to roll your own implementation based on that one? Then you can tell me what you think :-).
At this point I'm unsure what (traits? structs?) should eventually land in embedded-hal to promote a common DMA API across device crates.
I'm going to close this RFC since there's a simpler approach to DMA in the works. The new approach will get its own RFC posted in this issue tracker.
For reference, said simpler approach is discussed in https://github.com/rust-embedded/embedded-hal/issues/37
In this issue I'm going to present the DMA API I have implemented for the blue-pill and that I have used in my recent robotic application with the goal of using it to start a discussion about DMA based API.
Pre-requisites
You should understand how
RefCell
works and how it achieves dynamic borrowing.Detailed design
Core idea
The DMA peripheral behaves like an "external mutator". I like to think of it as an independent processor / core / thread that can mutate / access data in parallel to the main processor. From that POV a DMA transfer looks like a fork-join operation like the ones you can do in std-land with threads.
With that mindset: in a DMA transfer you want to hand out "ownership" of the data / buffer from the processor to the DMA for the span of the transfer and then claim it back when the transfer finishes. Except that "ownership" in the full sense ("the owner is in charge of calling the destructor when the value is no longer needed") is not applicable here because the DMA can't destroy the buffer is using for the transfer. So instead of ownership what the proposed API will transfer is temporary read or write access to the data.
Read or write access sounds like
&-
and&mut-
references but the proposed API won't use those. Instead it will use buffers withRefCell
-style dynamic borrowing.Buffer
The main component of the proposed API is the
Buffer
abstraction:The
data
andflag
fields effectively form aRefCell<B>
. Although not explicitly boundedB
can only be an array of bytes ([u8; N]
). TheCHANNEL
type parameter indicates which DMA channel this buffer can work with; possible values are phantom types likeDma1Channel1
,Dma1Channel2
, etc. Finally thestatus
field indicates if the DMA is currently in possession of the buffer.Buffer
exposes publicborrow
andborrow_mut
methods that behave exactly like the onesRefCell
has.Buffer
also exposes privatelock
s andunlock
s methods that are meant to be used only to implement DMA based APIs.The
lock
andunlock
pair behaves like a splitborrow
operation. Aborrow
call will check that no mutable references exist and then hand out a shared reference to data wrapped in aRef
type while increasing theBuffer
shared reference counter by one. When thatRef
value goes out of scope (it's destroyed) theBuffer
shared reference counter goes down by one. Alock
call does the first part: it checks for mutable references, increases the counter and hands out a shared reference to the data. However the return type is a plain shared reference&T
so when that value goes out of scope the shared reference counter won't be decremented. To decrement that counterunlock
must be called. Likewise thelock_mut
andunlock_mut
pair behaves like a splitborrow_mut
operation.You can see why it's called
lock
andunlock
: oncelock
ed theBuffer
can't no longer hand out mutable references (borrow_mut
) to its inner data until it getsunlock
ed. Similarly once aBuffer
has beenlock_mut
ed it can't hand out any reference to its inner data until it getsunlock_mut
ed.DMA based API
Now let's see how to build a DMA based API using the
Buffer
abstraction. As an example we'll build aSerial.write_all
method that asynchronously serializes a buffer using a DMA transfer.Aside from the
Ref
in the function signature, which is not acell::Ref
(it's astatic_ref::Ref
), this should look fairly straightforward. For now readRef<'a, T>
as&'a T
; they are semantically equivalent. I'll cover what theRef
newtype is for in the next section.Let's see how to use this method:
At this point the transfer is ongoing and we can't mutably access the buffer while the transfer is in progress. How do we get the buffer back from the DMA?
The
Buffer.release
is a potentially blocking operation that checks if the DMA transfer is over andunlock
s theBuffer
if it is. Note that the aboveimpl
ementation is specifically forCHANNEL == Dma1Channel4
. Other similar implementations can cover the other DMA channels.Continuing the example:
Alternatively, using callbacks / tasks:
static_ref::Ref
The
Ref<'a, T>
abstraction is a newtype over&'a T
that encodes the invariant that theT
to which theRef
is pointing to is actually stored in astatic
variable and thus can't never be deallocated.The reason
Ref
is used instead of just&-
in thewrite_all
method is to prevent using stack allocatedBuffer
s with that method. Stack allocatedBuffer
s are rather dangerous as there's no mechanism to prevent them from being deallocated. See below what can happen if aSerial.read_exact
method used&-
instead ofRef
:Unresolved questions
Is there an equally flexible (note that with this approach the DMA transfer start and finish operations can live in different tasks / interrupts / contexts) alternative that involves no runtime checks?
Should we extend this to also work with
Buffer
s allocated on the stack? My first idea to allow that was to add a "drop bomb" toBuffer
. As in the destructor will panic if there's an outstandinglock
. See below. (But this probably a bad idea since panicking destructor is equal to abort (?))Do note that an unsafe
Ref::new
exists so you can create aBuffer
on the stack and wrap a reference to it in aRef
value. This will be like asserting that the buffer will never get deallocated. Of course it's up to you to ensure that's actually the case.In the blue-pill implementation
Buffer
is defined in theblue-pill
crate itself but to turn DMA based APIs into traits we would have to move thatBuffer
abstraction into theembedded-hal
crate. That complicates things because (a)lock
et al. would have to become public and (b) e.g.Dma1Channel1
wants to be defined in thestm32f103xx-hal-impl
crate but that means one can't writeimpl Buffer<B, Dma1Channel1>
due to coherence. I have no idea how to sort out these implementation details.This API doesn't support using a single
Buffer
with more than one DMA channel (neither serially or concurrently). Is that something we want to support?Since we have the chance: should we reduce the size of the
flag
field? The core implementation usesflag: usize
but4294967295
simultaneouscell::Ref
sounds like a very unlikely scenario in a microcontroller.