Define the NetworkChannel interface

erlingrj commented 1 week ago

Lasse, have you made any progress in this regard? I am somewhat blocked on this because I want to implement start-tag distribution and fix akwardness of startup.

My current propsal is this:

struct NetworkChannel {
  size_t dest_channel_id; // So that we can "address" one of several NetworkChannel's at the other end.
  lf_ret_t (*open_connection(NetworkChannel *self);
  lf_ret_t (*is_connected(NetworkChannel *self);
  lf_ret_t (*send_blocking)(NetworkChannel *self, TaggedMessage *message);
  void *(register_receive_callback(...)
};

Key design decisions here that we can discuss are:

It does not make the difference between server/client. This might be a bad decision and we might reconsider and add a flag to the constructor.
It introduces the notion of a dest_channel_id. This is an address into the federate on the other side. It lets you say that this message is destined for a particular NetworkChannel at the other side. It is redundant with TCP/UDP since we can just use ports. We might argue that it is not a good idea to have this in the NetworkChannel interface and instead let the different networking stacks handle it.
Only send_blocking is blocking. I.e. open_connection does not guarantee that a connection is established. Only when is_connected returns LF_OK can we conclude that a connection is established. This has the advantage that a single-threaded runtime can "open" all his connections and then in the end poll all of them in a while loop until all are connected. Any suggestions for an alternative approach?

LasseRosenow commented 1 week ago

So your idea is to put the server-client information into the tcp_ip_channel concrete type? At least somehow the channel needs to know it it should call accept and bind or connect etc.
As far as I can see for now I think this is indeed duplicated. At least for BLE we have mac addresses and for LoRaWAN we have their specific IDs. Or is there the need for multiple channels to the same device? If that is the case I think we must have this channel ID anyways. In that case one note: It is encouraged not to use size_t for this kind of properties. Maybe use a more concrete type such as uint32_t or uint16_t etc.?
The idea sounds good, but I am not sure about the implementation details yet though. For TCP at the server side we need to first call bind and then accept . 3.1. Would the idea be to call the bind method inside of open_connection and then accept function in a non-blocking mode inside of the is_connected function? (It is somewhat strange to let is_connected "do work", but then that just depends on the perspective from which you are looking at it ;)) 3.2. Or maybe call bind and accept in open_connection and then continuously call open_connection in a non-blocking way in addition to is_connected?

For 3. I think I am somewhat in favor of 3.1 since it could be confusing to continuously call open_connection until a connection is actually open?

LasseRosenow commented 1 week ago

What about close_connection? Is this something that just does not happen in Lingua Franca?

erlingrj commented 1 week ago

So your idea is to put the server-client information into the tcp_ip_channel concrete type? At least somehow the channel needs to know it it should call accept and bind or connect etc.

Yes. When constructing e.g. a TcpIpChannel with TcpIpChannel_ctor() you need to tell it if its the client or server, and the IP adddress + port number of the server. If other networking stacks also have this asymmetry, i.e. there is a server and a client in the communication, we can consider making this part of the network abstraction yes.

As far as I can see for now I think this is indeed duplicated. At least for BLE we have mac addresses and for LoRaWAN we have their specific IDs. Or is there the need for multiple channels to the same device? If that is the case I think we must have this channel ID anyways.

We must support a single device having multiple NetworkChannels. A federate running on a device will have one NetworkChannel per other federate he talks to. However, we will only have a single NetworkChannel between two federates. So all "LF Connections" between two federates are combined into a "Bundle" together with a single NetworkChannel.

The idea sounds good, but I am not sure about the implementation details yet though. For TCP at the server side we need to first call bind and then accept .

Currently my examples dont work if the client calls connect before the server calls bind or accept (I dont know which of the two it is). We need to implement this in a way that is guaranteed deadlock-free and not dependent on timing at all.

3.1. Would the idea be to call the bind method inside of open_connection and then accept function in a non-blocking mode inside of the is_connected function? (It is somewhat strange to let is_connected "do work", but then that just depends on the perspective from which you are looking at it ;))

The idea would be to do the work in is_connected yes. The problem it solves is that we might have many NetworkChannels coming in and out and there might be cycles. We cannot have a blocking API where we end up blocking in a cycle = a deadlock. So we must be able to wait on all NetworkChannels at the same time.

3.2. Or maybe call bind and accept in open_connection and then continuously call open_connection in a non-blocking way in addition to is_connected?

I am no expert on sockets. But currently the main problem is actually that connection fails if client calls connect to early. So would maybe put bind in open_connection (if server, and nothing if client). And then is_connected (can rename to do_connection or whatever), does non-blocking connect/accept. It is important though that we do not depend on the order in which these these things are done. We can only expect that open_connection comes before is_connected at a particular federate, not accross them.

Thanks for feedback Lasse. This proposal is quick-and-dirty and I am very open to changing it, just want the discussion rolling.

LasseRosenow commented 1 week ago

Thanks for feedback Lasse. This proposal is quick-and-dirty and I am very open to changing it, just want the discussion rolling.

I think it is actually already a very good proposal as long as it is clear what function is doing what and why, but we can do that in the header file description.

I am no expert on sockets. But currently the main problem is actually that connection fails if client calls connect to early. So would maybe put bind in open_connection (if server, and nothing if client). And then is_connected (can rename to do_connection or whatever), does non-blocking connect/accept. It is important though that we do not depend on the order in which these these things are done. We can only expect that open_connection comes before is_connected at a particular federate, not accross them.

I just wanted to come up with the same idea after reading your feedback :) I think this should work at least for TCP. Calling connect in open_conneciton for the client indeed makes no sense because it can fail and then is not called again, so yes it makes sense to put it into do_connection or maybe try_connect and call it non-blocking as it can fail as long as the server has not called bind. The network stack takes care of the tcp handshake to complete for the client even if the server has not yet called accept. This should not be a problem.

Another point: I am thinking in theory we could also drop open_connection and just do whatever we do when we call try_connect or do_connection the first time? So the first while loop iteration would implicitly be a open_connection and the second while loop and so on would be do_connection or try_connect? Or do you think that could have disadvantages that I am missing?

erlingrj commented 1 week ago

I just wanted to come up with the same idea after reading your feedback :) I think this should work at least for TCP. Calling connect in open_conneciton for the client indeed makes no sense because it can fail and then is not called again, so yes it makes sense to put it into do_connection or maybe try_connect and call it non-blocking as it can fail as long as the server has not called bind. The network stack takes care of the tcp handshake to complete for the client even if the server has not yet called accept. This should not be a problem.

Naming is very important, and I think try_connect is a much better name yes. I have to look closer into the issue that I observed at Zephyr. I sometimes observe that the client is blocked at connect while the server is blocked at accept. Could it be because client called connect even before server called bind? I will investigate this today, I solved it by adding some sleep's for now.

Another point: I am thinking in theory we could also drop open_connection and just do whatever we do when we call try_connect or do_connection the first time? So the first while loop iteration would implicitly be a open_connection and the second while loop and so on would be do_connection or try_connect? Or do you think that could have disadvantages that I am missing?

This makes sense yes, probably no need to have two API calls for this.

A good excercise would be to see how this would work for BLE and LoraWAN also

LasseRosenow commented 1 week ago

Could it be because client called connect even before server called bind?

Yes connect by default is blocking. But it can be configured to be called in a non blocking way using the SOCK_NONBLOCK flag.

A good excercise would be to see how this would work for BLE and LoraWAN also

Yes this will get interesting. My fingers can't wait to start playing around with it :)

LasseRosenow commented 1 week ago

I have started to implement the UdpSocket and am wondering if I could maybe use this new interface already and migrate the tcp to it? Just want to make sure we don't duplicate work

erlingrj commented 1 week ago

Yes, that would be great. The earlier we converge on this new interface the better. If you can take the lead on it, that would be really great :D

lf-lang / reactor-uc

Define the NetworkChannel interface #83