rust-embedded-community / embedded-nal

An Embedded Network Abstraction Layer
Apache License 2.0
177 stars 25 forks source link

[Question] Closure-based API for zero-copy #12

Open MathiasKoch opened 4 years ago

MathiasKoch commented 4 years ago

The current read function of of the traits takes in a buffer, requiring an additional buffer to be allocated

Would it make sense to add something along the lines of

fn read_with<F>(&self, socket: &mut Self::TcpSocket, f: F) -> nb::Result<usize, Self::Error>
where
    F: FnOnce(&[u8]) -> usize,

Making it easier to use with socket buffers of the kind used in smoltcp, without the additional clone? https://github.com/smoltcp-rs/smoltcp/blob/bdfa44270e9c59b3095b555cdf14601f7dc27794/src/socket/tcp.rs#L752

MathiasKoch commented 4 years ago

Maybe even

/// Call `f` with a slice of octets in the receive buffer, and dequeue the
/// amount of elements returned by `f`.
///
/// If the buffer read wraps around, the second argument of `f` will be
/// `Some()` with the remainder of the buffer, such that the combined slice
/// of the two arguments, makes up the full buffer.
fn read_with<F>(&self, socket: &mut Self::TcpSocket, f: F) -> nb::Result<usize, Self::Error>
where
    F: FnOnce(&[u8], Option<&[u8]>) -> usize,
ryan-summers commented 4 years ago

It would also be desirable to have a write equivalent of this.

These types of functions allow us to create zero-copy abstractions for higher level protocols, such that we can write data directly into the outbound FIFO or (potentially) read entire packets directly from the incoming FIFO.

MathiasKoch commented 4 years ago

Could we get something drafted up for zero-copy usage? It's great that this is released now, but i still lack the zero-copy API to be able to replace my local fork with the crates.io version. On top of that i think any serious networking abstraction should provide a way of using it in a zero-copy way.

ryan-summers commented 4 years ago

The main complication I see is that not all network stacks will support zero-copy abstractions. Take for example a TCP stack using the W5500 ("hardware" network stack that communicates with the main MCU over SPI). The only way to "write" to the socket is to perform a SPI transaction to the W5500. The SPI transaction requires the original buffer, so the current API works well for this.

However, other network stacks, like smoltcp, can provide a direct buffer to write into (with modifications, smoltcp can't do this currently to my knowledge. See https://github.com/smoltcp-rs/smoltcp/issues/359).

If we were to force all users to supply the read_with() API (closure-based), then the W5500 network stack implementation would have to maintain some internal [u8] buffer that it could provide to the read_with() API and then write out to the W5500. In reality, this just offloads the explicit buffering to the stack impl (which isn't a bad thing). However, even stacks like smoltcp do not yet support direct-write to the underlying TCP socket data without buffering yet, so the value added isn't present for these two stacks. (It feels a bit like the chicken and the egg problem here)

MathiasKoch commented 4 years ago

I don't necessarily think we need to force anyone to implement it.

Two ideas come to my mind

  1. Add a default implementation for one using the other (might not actually be possible, due to allocation issues?)
  2. Make the zero-copy an optional extension trait? (This does break the one impl fits all abstraction though?)
ryan-summers commented 4 years ago

I don't necessarily think we need to force anyone to implement it.

I think if we propose it into the trait, that's essentailly what we have to do.

  1. Add a default implementation for one using the other (might not actually be possible, due to allocation issues?)

I think we can make a default implementation and require the user to specify some generic_array::ArraySize? I'm not entirely sure if this is possible - I can poke around with this - it's an interesting idea.

  1. Make the zero-copy an optional extension trait? (This does break the one impl fits all abstraction though?)

I think we should make all effort to avoid fractioning/splitting the traits, otherwise the abstraction breaks down and it becomes less useful.

MathiasKoch commented 4 years ago

I think we can make a default implementation and require the user to specify some generic_array::ArraySize? I'm not entirely sure if this is possible - I can poke around with this - it's an interesting idea.

That would be awesome! Hit me on matrix if i can help with anything?

ryan-summers commented 4 years ago

I've added a sample for a TCP stack in https://github.com/ryan-summers/embedded-nal/blob/feature/closure-read-writes/src/lib.rs#L29-L106

The main downside of a default implementation is that we need to break a read() into distinct actions:

  1. Acquire data from the socket (but do not dequeue it)
  2. Allow the user closure to process N bytes
  3. Dequeue N bytes from the socket, where N <= bytes_available

This means that stacks must support reading without dequeuing, which isn't always possible. For example, this is not supported by std::net::TcpSocket because it implements the Read trait.

The other possibility is to enforce that users must consume the entire read size, but this is generally not desirable, since it forces the higher level abstraction to buffer partial packets, which the closure API is intended to resolve.

MathiasKoch commented 4 years ago

I can live with this from my end, though I agree on the downsides of the approach. Can't think of anything better, that still would allow for the functionality though.

One thing that comes to mind with this is how to handle using a circular socket buffer with this?

Example:

I am using this API for zero-copy decoding of MQTT packets, meaning i will only ever commit if there is a full MQTT packet in the buffer. let size = network.read_with(socket, |buf| clone_packet(buf, packet_buf).unwrap_or(0))where the result is the number of bytes copied to packet_buf (or 0 if there is still not a full MQTT packet available).

I can then in-place decode the packet_buf to a structured MQTT packet containing references into the memory buffer.

The downside being, that if the packet is actually in my circular buffer, but half of it is wrapped around, i would never be able to read it.

ryan-summers commented 4 years ago

The downside being, that if the packet is actually in my circular buffer, but half of it is wrapped around, i would never be able to read it.

That's why I mentioned https://github.com/smoltcp-rs/smoltcp/issues/359 - this closure-based approach will never work if a network stack uses a ring buffer as the underlying storage mechanic because there's always the possibility that the packet isn't contiguous. If this is the case, it will always need to be copied into memory such that it would then be contiguous.

MathiasKoch commented 4 years ago

That makes sense! Perhaps i should look into replacing my socket buffer implementation with a bip based version.. Perhaps based on bbqueue

MathiasKoch commented 4 years ago

I would be quite happy with https://github.com/ryan-summers/embedded-nal/blob/feature/closure-read-writes/src/lib.rs#L29-L106

Could you maybe open a draft PR, so we can get some eyes and comments on the approach?

MathiasKoch commented 4 years ago

Update

That makes sense! Perhaps i should look into replacing my socket buffer implementation with a bip based version.. Perhaps based on bbqueue

It seems like bbqueue would not actually solve my issue here? It would give me a contiguous piece of memory for writing, but does not set such guarantees when reading?

MathiasKoch commented 3 years ago

Relevant info: BBqueue has merged this PR: https://github.com/jamesmunns/bbqueue/pull/77

Adding essentially the same feature that i was originally asking here.. Would be nice if we could come up with an API that would allow taking advantage of such a buffer to allow zero-copy implementations.

Sympatron commented 3 years ago

Maybe a special trait that implementers don't have to provide like we currently have for Client/Server traits would be best here.

ryan-summers commented 3 years ago

The problem is that if we make a separate trait, we have now divided the ecosystem. If a library required a zero-copy stack, then it would not be usable with stacks that don't have zero-copy support (e.g. smoltcp). Essentially, rust semantics don't allow for a logical OR of implementations, so we're restricted in only exposing a single trait for stacks.

Sympatron commented 3 years ago

Maybe forcing implementers to add zero-copy methods isn't so bad after all. Thinking about it, I think it may be possible for stacks like smoltcp to have a fake zero-copy interface that does copy in the background. This would obviously be less efficient, but would still work.

ryan-summers commented 3 years ago

That's a potential avenue as well - it would be very reasonable to add zero-copy APIs, but then force the implementation of the traits to internally buffer if the underlying stack doesn't support zero-copy write/read. That would allow the individual network stack implementation to tailor the buffer size to their specific use case as well.

MathiasKoch commented 3 years ago

I think that is the way i would like to see this go.. Maybe we can even provide some sensible default implementation that does copy?

hazelutf8 commented 3 years ago

The downside being, that if the packet is actually in my circular buffer, but half of it is wrapped around, i would never be able to read it.

That's why I mentioned smoltcp-rs/smoltcp#359 - this closure-based approach will never work if a network stack uses a ring buffer as the underlying storage mechanic because there's always the possibility that the packet isn't contiguous. If this is the case, it will always need to be copied into memory such that it would then be contiguous.

Recently I saw the core::ops::Index trait, could this allow an arbitrary backing buffer which implements the index operation? If I understand correctly, maybe this would allow for either a slice or circular buffer at the implementation's choice?

chrysn commented 3 years ago

AFAIR that depends on whether indexing by slices is allowed. If so, indexing by 0..end still needs to produce contiguous memory. Indexing for by individual bytes makes scattered backends easy, but is very limiting (and probably inefficient, eg feeding single bytes to serde) in use.

A middle ground may be rope-like data structures, but I have no experience with them in Rust.