rust-embedded-community / embedded-nal

An Embedded Network Abstraction Layer
Apache License 2.0
177 stars 25 forks source link

Adding new TCP socket APIs #85

Closed ryan-summers closed 1 year ago

ryan-summers commented 1 year ago

Fixes #52 by exposing a more robust API for TCP sockets.

I recently hit an issue with minimq where the is_connected() interface was determined to be insufficiently expressive to properly managing the TCP socket state.

ryan-summers commented 1 year ago

CC @MathiasKoch as I think you're the other user of TCP traits? Also @chrysn

chrysn commented 1 year ago

I'm having a hard time following the trouble in the MQTT implementation. What is the decision for which "is this socket in the CONNECTED state" does not suffice?

Do the "may send" / "may receive" / "is open" methods suffice to also cover similar cases, or might we wind up needing to express all the FSM states? Can we get them out of a POSIX socket? Can we even get the ones introduced here on a POSIX system? (I'm generally happy to say "then POSIX sockets are a bad API", but I don't want to do that without due consideration).

MathiasKoch commented 1 year ago

I have to say that personally I am currently in the process of migrating towards the async version, and i think that the general API of the async version is MUCH more rust-like, and should be implemented for the blocking version as well if you ask me.

ryan-summers commented 1 year ago

I'm back with some findings after trying to use the embedded-nal-async approach - thanks for the advice you two :)

I have to say that personally I am currently in the process of migrating towards the async version, and i think that the general API of the async version is MUCH more rust-like, and should be implemented for the blocking version as well if you ask me.

The main issue that I'm running into is that a struct may not own a Tcp::Connection and the TcpStack simultaneously.

In my use case (MQTT), the MQTT client needs to close and open TCP sockets in response to adverse network events (i.e. link loss, etc.), which implies that it has to own the network stack (in order to establish new connections as required). Because of this, the Connection-based API used by the async version doesn't seem to be the right approach for more complex stack requirements.


I'm having a hard time following the trouble in the MQTT implementation. What is the decision for which "is this socket in the CONNECTED state" does not suffice?

The specific case is differntiating between "attempting to connect", "connected", and "no-longer-connected". All 3 of these cases have different required behaviors, but the is_connected() only allows us to detect two states.

  1. "attempting to connect" => We should keep waiting and not do anything until we are connected
  2. "connected" => We are connected to our peer and can actively rx and tx, send our packets now
  3. "no-longer-connected" => We need to close the TCP socket and open a new one

Note that the actions in (1) and (3) are polar opposites. In one case, we actively want to do nothing (i.e. wait for connection). In the other case, we need to close the socket and re-attempt connections.

Do the "may send" / "may receive" / "is open" methods suffice to also cover similar cases, or might we wind up needing to express all the FSM states?

Honestly, the more I think about it, the more it seems like maybe we should just expose the FSM directly.

Can we get them out of a POSIX socket? Can we even get the ones introduced here on a POSIX system? (I'm generally happy to say "then POSIX sockets are a bad API", but I don't want to do that without due consideration).

I think POSIX sockets expose this information primarily through returned error codes. If the TX pipe is closed when you try to send, it will error. Similarly with the RX pipe. Maybe this would be a better approach... Let me give it a shot.

chrysn commented 1 year ago

I think POSIX sockets expose this information primarily through returned error codes. If the TX pipe is closed when you try to send, it will error. Similarly with the RX pipe. Maybe this would be a better approach... Let me give it a shot.

That sounds like a EAFP instead of LBYL approach, which I appreciate, so I'm looking forward to seeing what comes out of it.

own a Tcp::Connection and the TcpStack simultaneously.

IIRC going towards more many-owners-of-shared-stack was a necessity of the async API because multiple components can have pending operations (say, reads) that overlap in time, which would conflict with the necessity to hold an exclusive reference to the stack in the async model. In a blocking model, where one would either read only from one socket or put all one's sockets into some multiplexer (which'd do a read given a single reference to the stack), that's not necessary. I'd hope that many of the nice properties of the async API can be carried over to the sync world, even if the "main argument" is a (&mut socket, &mut stack) instead of a (&mut socket-including-shared-stack).

ryan-summers commented 1 year ago

I'd hope that many of the nice properties of the async API can be carried over to the sync world, even if the "main argument" is a (&mut socket, &mut stack) instead of a (&mut socket-including-shared-stack).

The issue isn't actually mutability, but rather lifetime issues. If you try the following:

struct MyStruct<T: TcpStack> {
    connection: T::Connection<'?>,
    stack: T,
}

What is the lifetime of connection? It's supposed to be that of stack, but that's not possible to specify because it would be referencing the lifetime of Self for MyStruct

ryan-summers commented 1 year ago

@chrysn Updates are now available on the PR. I opted for a base error type that allows for library users to detect pipe errors that require TCP to be reconnected. It's working well on my local setups and also resolves the original MQTT I was having. Let me know what you think.

If we want to move towards a more socket-directed API, I'd like to keep that off for a new PR. There's a lot of work involved in that

ryan-summers commented 1 year ago

@rust-embedded-community/embedded-nal Anyone available to help with review on this one? :)

thejpster commented 1 year ago

Seems OK to me, but... Does anyone have a public example showing how they have implemented the new error codes? Can it be done on all the existing stacks, or are you going to break some of them?

ryan-summers commented 1 year ago

Seems OK to me, but... Does anyone have a public example showing how they have implemented the new error codes?

The above-linked minimq PR is a client side impl (https://github.com/quartiq/minimq/pull/126) and the smoltcp-nal (https://github.com/quartiq/smoltcp-nal/pull/42) above is a stack implementation, both working on hardware with stabilizer.

Can it be done on all the existing stacks, or are you going to break some of them?

I'd be surprised if a network stack wasn't capable of telling you the socket is closed when trying to read or write to TCP. In any case, a stack doesn't have to implement these error Codes and can mark everything as other if they have to.