[Proposal] Socket-directed API

ryan-summers commented 3 years ago

I was discussing the design of the NAL with @jordens the other day and an interesting point was brought up.

Why not change the API of the embedded-nal to more closely follow python/std::net?

Sample proposal:

pub trait UdpSocket {
    type Error;

    fn bind(&mut self, port: u16) -> Result<(), Error>;
    fn send(&mut self, data: &[u8]) -> Result<usize, Error>;
    fn receive(&mut self, data: &mut [u8]) -> Result<usize, Error>;
    fn close(self);
}

pub trait TcpSocket {
    type Error;

    fn connect(&mut self, remote: SocketAddr) -> Result<(), Error>;
    fn send(&mut self, data: &[u8]) -> Result<usize, Error>;
    fn receive(&mut self, data: &mut [u8]) -> Result<usize, Error>;
    fn close(self);
}

pub trait NetworkStack {
    type UdpSocket: UdpSocket;
    type TcpSocket: TcpSocket;

    fn open_udp(&mut self) -> Result<Self::UdpSocket, Error>;
    fn open_tcp(&mut self) -> Result<Self::TcpSocket, Error>;
}

This allows stack implementations to freely hand out sockets without having to worry about coherency between multiple threads when the network stack assigns a unique buffer to each socket. In other cases, the network stack can implement some means of synchronizing access (e.g. for off-chip network stacks controlled via serial communications).

This makes it much easier to use multiple sockets in different threads/contexts with a single network stack.

What are some of your thoughts on this @jonahd-g / @MathiasKoch ?

Sympatron commented 3 years ago

This would indeed make the sockets easier to use. But it would make writing a network stack a lot more difficult, because of all the synchronization that would be necessary. Also each socket would have to contain a reference to the stack internally, which would complicate lifetimes. I think the current approach with SharedStack is better for mutli-threaded applications.

Python's API is nice, but Pythen the GIL, a garbage collector (i.e. no lifetimes) and uses the OS for socket I/O which does all the heavy lifting. It is therefore a lot easier to write an easy to use API in Python than it is in Embedded Rust.

ryan-summers commented 3 years ago

because of all the synchronization that would be necessary

Synchronization is only required for stacks that don't have socket buffers (e.g. the W5500). No synchronization is required for something like smoltcp.

I would also note that this synchronization is already required, and we're not meeting it (see below).

each socket would have to contain a reference to the stack internally, which would complicate lifetimes.

This is not true for all cases, but is relevant for some use cases.

I think the current approach with SharedStack is better for mutli-threaded applications.

The SharedStack cannot be used in a multi-threaded application (at least from more than a single thread) because it does not implement Sync. It is only safe to use within a single execution context, which is why I was proposing these changes.

While I agree this complicates things slightly, it's because things have to get more complicated if the network stack supports multiple execution priorities (e.g. RTIC tasks with different priority levels). The alternative here would be to implement something like shared-bus for the embedded-nal, which is a pretty large undertaking.

ryan-summers commented 3 years ago

Also, after thinking about this more, let's take an example where you have a network stack off-chip (e.g. the W5500, a TCP/UDP stack that communicates via SPI).

Instead of having a single NetworkStack with the SPI bus, you can use shared-bus to construct multiple references to the SPI communication interface and provide one of each to the UdpSocket/TcpSocket structures. This handles resource sharing extremely nicely with very little code required for implementers of the embedded-nal imo.

Sympatron commented 3 years ago

One could easily write a variant of SharedStack that is Sync by replacing RefCell with some kind of mutex. If you are using RTIC it's even easier, because you can put the stack in a resource and RTIC takes care of everything.

I wouldn't see W5500 and similar as an edge case. Currently it's one of the two main stacks for embedded-nal I can think of (the other one being smoltcp).

Instead of having a single NetworkStack with the SPI bus, you can use shared-bus to construct multiple references to the SPI communication interface and provide one of each to the UdpSocket/TcpSocket structures. This handles resource sharing extremely nicely with very little code required for implementers of the embedded-nal imo.

I am pretty sure this would be almost impossible to get right, because all stacks could mess with the state of all other stacks by for example issuing a soft reset or by writing to any other global register.

ryan-summers commented 3 years ago

One could easily write a variant of SharedStack that is Sync by replacing RefCell with some kind of mutex. If you are using RTIC it's even easier, because you can put the stack in a resource and RTIC takes care of everything.

Yes, but that requires essentially duplicating shared-bus here. As shown by that repository, this is a non-trivial task. If we can find means of avoiding code duplication, it will result in more maintainability and portability. If we can leverage shared-bus as a dependency instead of copying the code/ideas here, that's a win in my book.

I wouldn't see W5500 and similar as an edge case. Currently it's one of the two main stacks for embedded-nal I can think of (the other one being smoltcp).

I do not consider it an edge case. I am actively using it for one of our projects. Both smoltcp and w5500 are the reason that drove me to work on the embedded-nal in the first place, so my goal is to treat both use cases as first-class citizens here.

I am pretty sure this would be almost impossible to get right, because all stacks could mess with the state of all other stacks by for example issuing a soft reset or by writing to any other global register.

This is very much not impossible to get right and is quite easy to do when the driver is written within a single crate. You simply have to ensure that individual sockets do not modify other socket registers. Given Rust's ownership semantics, this is quite feasible and easily doable in my opinion.

Sympatron commented 3 years ago

I suppose it could be done. In my opinion it's just way more complicated than adding a thread-safe SharableStack is way more simple then trying to write thread-safe drivers. This would add no additional cost if thread safety is not needed and solve all thread related issues in one go independent of the underlying driver.

I just had a look at shared-bus and it's source is totally straight forward and not that complicated. I agree that it would be nice to reuse shared-bus instead of replicating it here, but I don't know how this could be done. At the moment it is limited to I²C and SPI. At least the mutex types could be easily reused. Not sure how to use the BusManager for embedded-nal without adding embedded-nal as a dependency to shared-bus. Maybe shared-bus could be opened up enough to allow for arbitrary bus types with corresponding proxies. Don't know how difficult that would be.

Maybe trying to write an implementation first (especially for the W5500 case) would be helpful to see how difficult it actually is.

ryan-summers commented 3 years ago

I'm working on a proof-of-concept that I hope will clarify how everything will work together. I'll post some details on this once I finish, but I do think this will make a much cleaner usage and API. I don't think there's any reason that we need to force objects to refer to the whole network stack when they could just refer to a part of it (e.g. the socket buffer) when possible.

lachlansneff commented 3 years ago

I agree that the api should be socket focused, but I think that socket or open_* should not be a thing:

trait UdpClient {
    type Socket: Write + Read; // Dropping a socket closes it.
    type Error;

    fn connect(&mut self, remote: SocketAddr) -> Result<Self::Socket, Self::Error>;
}

Or, in the case of async:

trait AsyncUdpClient {
    type Socket: AsyncWrite + AsyncRead; // Dropping a socket closes it.
    type ConnectFuture<'a>: Future<Output = Result<Self::Socket, Self::Error>> + 'a;
    type Error;

    fn connect<'a>(&'a mut self, remote: SocketAddr) -> Self::ConnectFuture<'a>;
}

MathiasKoch commented 3 years ago

Bear in mind that we don't have Read + Write available in no_std context where this library is focused.

lachlansneff commented 3 years ago

There is a no_std io::Read, io::Write crate, and I believe that they'll eventually make their way to core.

ryan-summers commented 3 years ago

While I agree the ultimate goal here should be async, I think there's still some work to stabilize things in core before we get it on embedded (unless someone has done so already - I believe embassy is an async executor, but I don't know if we can have async APIs on stable right now? I honestly don't know).

In any case, lets keep this issue related to the main topic and keep async APIs in https://github.com/rust-embedded-community/embedded-nal/issues/47

chrysn commented 3 years ago

Would it make sense to keep the to-be-implemented API as it is, but provide a shim that allows combining a socket with an owned stack (which is easy if you only ever have one socket, or if your implementation is synchronized anyway so your stack is clonable, or you've created a practically-sharedly-usable stack through shared-bus or similar) into something that has the proposed API?

Downside would be that it might create a split in the ecosystem (anything built on the simple API would only be usable under one of the above conditions), or need very strong warning words that any library should use the (self, stack) API and that only leaf crates can use the simple (self) form for simplicity.

DrTobe commented 2 years ago

Yes, but that requires essentially duplicating shared-bus here.

Correct me if I am wrong but as far as I see, shared-bus is already duplicated in this repository, but without mutex support:

SharableStack = shared_bus::BusManager + shared_bus::NullMutex
SharedStack = shared_bus::I2cProxy or shared_bus::SpiProxy (or the relatively new AdcProxy)

Because an embedded-nal stack is not a global static (like in operating systems), the network stack needs ownership or a &mut of the underlying hardware and strictly speaking, each socket needs ownership or a &mut of the network stack (the latter is how this relates to my original problem #62). I am not yet exactly sure if we can get away with a single RefCell somewhere and if we can avoid having to re-implement this resource sharing in each single driver implementation, but I hope so.

DrTobe commented 2 years ago

I have experimented today with how a socket-directed API could look like. In the beginning, I took the result I ended up with in #62 and generalized it so that part of the implementation could be moved out of the driver into embedded-nal. In particular, the TcpClientStack and TcpSocketHandle types which take care of resource sharing and implementing Drop could be moved, the driver developer then only needs to implement a trait NalDriver which contains all functionality. This intermediate step still relies on a RefCell and can be seen here.

Afterwards, I tried to get rid of the hard-coded RefCell. At first, I thought I could reuse shared-bus's mutex definition but then I realized that a standalone mutex trait would be much cleaner and that's how I found the mutex-trait crate. It differs from the mutex definition in shared-bus by requiring a &mut self to lock it instead of a &self. This surprised me at first (because it also differs from what std::sync::Mutex does) but the decision is extensively explained in the RFC. So I used it which lead to this result.

This approach would allow a socket-directed API, implement Drop for sockets (no resource leaks) and allow network-stack-sharing but I also see a few downsides:

Users of this API would now depend on the generic type TcpClientStack<D,M> instead of on a trait as before. This is a little bit surprising at least.
The mutex implementation must be Clone. This requirement is fulfilled for &RefCells and can be achieved with std::sync::Arc<std::sync::Mutex> in std contexts, I guess. I think that this can be achieved in many cases but it boils down to having shared references to something (which can be Clone but are not mutable!) and handing out &mut references (which are required for the mutex_trait::Mutex::lock method). So this is the interior mutability pattern again and the users who set this up must be aware of what guarantees this requires.
Users could possibly call the trait methods manually without wrapping the NalDriver in the TcpClientStack type. Like this, the Drop implementation of TcpSocketHandle could be circumvented easily and all kinds of bad things could happen.

Altogether, we want to mutably share resources (at least share access to the hardware between multiple sockets) so I think the interior mutability pattern and some of the implementation details will always pop up.

On the other hand, I just realized that what I called NalDriver now is surprisingly close to what the TcpClientStack trait does currently. So from another perspective one could also say that this is just a safer wrapper around TcpClientStack's usage (Drop) which additionally allows for sharing the stack. So this can also be seen as an extension to the current SharableStack.

Finally, I do not think that reusing anything from shared-bus will be possible. It is just too different and shared-bus has one implementation for each bus type it supports. So doing this ourselves is equivalent to doing this in shared-bus.

MathiasKoch commented 2 years ago

I really like the result you ended up with!

So from my end i have no objections against working in that direction for the future. I do not have the time to drive the development, though i think it feels much more rust-like & comparable to how the std-net does it. I would be happy to update my driver & consumer crates if we reach a concensus and a release eventually.

chrysn commented 2 years ago

Note that this is being used (or experimented on, not sure of the maturity) in the TcpConnect trait of embedded-nal-async: The result of TcpConnect can be used through a mutable reference alone (along the embedded_io) traits, making it effectively a socket directed API.

ryan-summers commented 1 year ago

I took a stab at implementing this as part of #85, but ran into issues around lifetime specification of the TCP connection object, as that needs to mutably borrow the stack. See https://github.com/rust-embedded-community/embedded-nal/pull/85#issuecomment-1587215842. I talked to @MathiasKoch about this and he pointed out that one workaround for this is to enforce that the Stack is 'static, but that doesn't work when using i.e. stack proxies (shared network stack)

DrTobe commented 1 month ago

With my colleague @embediver, I stumbled upon this in a different but similar scenario so we discussed the embedded-nal trait design again. Based on our discussion about that topic, I have come to the conclusion that the current trait definition needs to be replaced by a socket-oriented definition. The fact that the corresponding embedded-nal-async traits are defined in such a way suggests that there exists a basic agreement about that but I am not sure what plans exist for the future of the non-async version of embedded-nal.

In the following, I try to outline my thoughts about that topic:

Prerequisites / Assumptions

I assume that a network stack implementation generally needs mutable access to some resources, probably hardware access and memory. This requirement generally propagates to the sockets.
There is generally a requirement to share the network stack in one way or the other. "Sharing" in this sense might also consist in passing the stack (or its exclusive &mut reference) around.
It should be possible to avoid resource leaks which may make a network stack unusable (if resources are not properly freed). This means that, if resources are locked or reserved, there should be a working Drop implementation (the trait design should allow such a Drop impl). Initially, trying to implement Drop on our modem driver has lead me to this discussion.

Mutable access requirements in Rust normally translate to having ownership or requiring (exclusive) &mut references. Naively, the sockets created/returned from the network stack would hold a &mut reference to the network stack. Since only a single &mut reference is allowed, this would automatically limit the network stack to a single socket which defeats the whole purpose of creating sockets in the first place.

Since we want multiple sockets and we can not hold &mut references in the sockets, we need some sort of trick to safely get mutable access to the underlying resources. Basically, this is exactly what the "interior mutability" pattern is for. For example, the stack could be wrapped by a RefCell which allows mutable access from a shared reference via its borrow_mut method which returns a RefMut which implements DerefMut.

Current Trait Design and Limitations

At the time of writing, the most recent released version of embedded-nal is 0.8.0. The TcpClientStack trait is defined in a way that for each operation (socket creation, socket usage and closing a socket) a &mut reference to the network stack is required. I have not been part of this trait's design, but from my perspective this looks like an attempt to allow network stack sharing without requiring the interior mutability pattern. And at first sight, this seems to work: Even if multiple sockets are used at different places, it is generally possible to pass the reference to the stack around, at least in single-threaded environments.

But at a closer look, this does not work as desired:

In multi-threaded environments, sharing the stack like this does not work without an additional mechanism. In such a situation, the mutable access to the stack could be protected by a mutex. Rust's std::sync::Mutex allows mutable access from a shared reference via its lock method which returns a MutexGuard which implements DerefMut. In this sense, the Mutex serves the same purpose in multi-threaded situations as the RefCell in single-threaded situations. The behaviour is just slightly different, e.g. while the Mutex will wait for the lock, the RefCell will panic if the borrow fails, because waiting (blocking) is meaningless in single-threaded environments.
Additionally, there does not seem to be a way to implement Drop on the socket types with the current embedded-nal trait design without the interior mutability pattern. This has already been discussed in https://github.com/rust-embedded-community/embedded-nal/issues/62 or further above in this issue.
Finally, I was really surprised to see that the embedded_nal::SharableStack+embedded_nal::SharedStack types implement RefCell-based sharing in the same crate which seemingly tries to avoid the interior mutability pattern. In combination, using this feels especially weird because using RefCells would allow for a socket-oriented API but the stack and the socket still need to be handled together.

In summary, the approach to sharing which is reflected in the current trait design does not seem to work well in the general case. So I am convinced that interior mutability using RefCells or using some kind of mutex is required anyways, so we can also fully accept that requirement and define a socket-oriented API which reflects that.

Socket-Oriented API

For a socket-oriented API, I would roughly follow the embedded_nal_async::TcpConnect trait, especially regarding the borrowing and lifetimes. Different from that trait, I would personally define two separate traits (per protocol), one trait for the stack and one trait for the socket. For the stack trait, I would keep the socket method to create the socket and leave the connection setup in the socket trait:

pub trait TcpStack {
    type Socket<'a>: TcpSocket
    where
        Self: 'a;

    type Error: embedded_io::Error;

    fn socket<'a>(&'a self) -> Result<Self::Socket<'a>, Self::Error>;
}

pub trait TcpSocket: embedded_io::Write + embedded_io::Read {
    fn connect(&mut self, remote: embedded_nal::SocketAddr) -> nb::Result<(), Self::Error>;
    fn close(&mut self) -> nb::Result<(), Self::Error>;
}

I would prefer having connect and close methods in the socket type for a few reasons:

All tasks/threads/users which only need a single connection, but probably multiple times, only need to depend on a TcpSocket. Due to explicit closing and (re-)opening, that type can be reused. If they only depend on Write and Read, they will neither be able to reuse connections nor will they be able to reconnect after possible connection losses. So I suspect that many trait consumers would then depend on the TcpStack anyways. Those who only want to depend on Write and/or Read can still do so.
Very simple network stack implementations, which just support a single socket, might just implement the TcpSocket trait directly. For those implementations, the whole interior mutability complexity would be removed.
Some may argue that returning opened connections from the stack prevents errors because it guarantees that we only operate on properly connected sockets. But since every connection can fail at any time, proper error handling is required for every single communication attempt anyways. NotConnected would be the error variant of choice then.

In total, any socket-oriented API seems cleaner. For the implementation side, it can be clearly communicated how to implement this. On the user side, it is easier and more convenient to use. So in my opinion, this is the way to go. If you plan to still support the non-async version, it should definitely move to such a design, too.

ryan-summers commented 1 month ago

@DrTobe Thanks for the comprehensive writeup, your conclusions about the design are largely correct. I think the one aspect that is missing is the fact that a RefCell cannot be used in mutli-threaded environments. What we're trying to do with interior mutability on the stack is essentially identical to the concept of bus sharing for I2C or SPI. Given that, we can draw inspiration from the design of embedded-hal-bus.

The most important aspect here is that a RefCell is not an acceptable synchronization primitive for a multi-context application. There always will need to be some type of synchronization primitive when multiple contexts are supported (i.e. Mutex, critical section, RTIC priorities via NVIC, etc).

Given this, I think it may make the most sense to copy the design of embedded-hal-bus for our purposes as well, since fundamentally they're doing the same thing: handling interior mutability with multiple owners.

The primary problem is that the actual synchronization mechanism depends on the end user application. Whatever design we take needs to be able to support a user-specified synchronization mechanism similar to how embedded-hal-bus handles this.

DrTobe commented 1 month ago

To me, it was totally clear that a RefCell can not be used in multi-context situations:

the Mutex serves the same purpose in multi-threaded situations as the RefCell in single-threaded situations

I was trying to express the idea that in any case, we need a method to "upgrade" the shared references to &mut references. In different situations, this method looks different. But with a trait definition that allows to create sockets from shared stack references, the decision on how to do this is initially left open. So there could be network stack implementations which use a RefCell and only work in single-threaded scenarios while others could implement some kind of synchronization mechanism to work in multi-context scenarios.

What we're trying to do with interior mutability on the stack is essentially identical to the concept of bus sharing for I2C or SPI.

Actually, I do not think that this is true anymore. Bus sharing for I2C or SPI can be done (single-threaded example) by just moving a bus implementation which requires &mut access into a RefCell which is subsequently shared with all consumers. The bus consumers then borrow_mut the bus, get &mut access and can manipulate the bus as if they were the exclusive users. For a network stack, different consumers need to access the stack differently, because consumer A should not access the connection of consumer B and vice versa. So for a network stack, the idea of stack sharing needs to be baked into the network stack implementation. Nowadays, I think that this actually makes a huge difference.

I have the impression that we are discussing two related but distinct problems:

How should the trait be designed to define a clear interface for trait implementers and/or consumers?
How can/should the stack resources be shared in different scenarios? And: How can we aid stack implementers to easily implement this for their network stack implementations?

Again, in the bus example, these questions are more easily answered:

The trait design requires &mut references, which is straightforward.
For different scenarios, we convert an owned bus into multiple shared buses. Depending on the situation, we just use the correct embedded-hal-bus helper type.

Regarding network stacks, my statements are primarily concerned with the first question. So in a first step, I would switch to a socket-oriented network stack trait which is defined with sharing in mind. Afterwards, I would work on the second question and see if we can work out a solution to support different sharing requirements with generic implementations.

rust-embedded-community / embedded-nal