[XPCC] Hidden ambiguity in components-container-association

strongly-typed commented 7 years ago

Dear all,

while working on XPCC over raw Ethernet frames I tripped over that the same component can be instantiated in multiple containers, e.g. here.

That is actually used in the RCA robot where <container name="drive big simulation"> and <container name="drive big"> instantiate nearly all the same components (<component name="driver" />, ...)

It is then by convention that these both containers should never ever be connected to the same network at the same time. While calling actions in driver may actually work (but you will get two ACKs, and two responses), it will totally mess up if driver publishes an event. You will then get two inconsistent events from two different sources.

For efficient Ethernet frame filtering in hardware I need to add the container in the destination MAC so that a pattern match filter (first five bytes of destination MAC address) can be used. But the lookup of a container (yeah, that's kind of static routing) is not unique if the same component can be instantiated in multiple containers in the same xml file.

Any suggestions how to resolve that issue?

TheTh0r commented 7 years ago

Actually I think there need to be two cases. 1st beeing an active component (which means sending acks) and there should always be only one instance of the same component. In the past when we used tipc having several people running the simulator in the same network lead to funny behavior because there were multiple instances of the same component running. In the current implementations it is not enforced that only one component exists (but I implemented this behavior in the now dropped tcp/ip communication backend by only accepting one connection to the server for each component) For debugging purposes a second variant needs to be available. Were you can receive all the actions, the real component would also receive, but not ack the messages. This way several instances could be able to log all the communication traffic (I also started implementing connecting to the server with the listen function) So to summarize:

There should only be one active version of each component. Currently this is not enforced by the software. But running more than one at the same time leads to undefined behavior.

The possibility to have passive versions of a component running in parallel to log all relevant communications would be a nice feature, but this is not required.

salkinium commented 7 years ago

cc @dergraaf @georgi-g

strongly-typed commented 7 years ago

would be a nice feature, but this is not required

The debug server on the robot is quite convenient. Although most of the relevant communication is done on the main board (strategy <-> driver <-> game components), loosing the full insight into the communication is a step back.

Using MAC addressing and an Ethernet switch makes this a bit more challenging. I probably have to look into multicast? The switch learns source and destination with the first Ethernet frame. From that point on no more broadcasts are sent.

This an action call with the corresponding ACK between an STM32 Nucleo F429I and my computer

            packet id 0x01
     source container 0x10
     source component 0x01
destination container 0x20
destination component 0x02

When communicating between two Nucleos, only the first frame is captured by my computer.

Maybe using a shared medium (10Base-T) can solve this problem. But further research is required to determine if the standard and common implementations still support such a shared medium. After looking into data sheets of Ethernet hub chipsets I would suggest that this might be possible.

strongly-typed commented 7 years ago

Actually, enabling multicast by setting the Group address bit in the MAC destination address seems to solve the switching issue.

dergraaf commented 7 years ago

It is then be convention that these both containers should never ever be connected to the same network at the same time.

With this convention it would be possible to send the message to all possible containers. E.g. have your look-up table return multiple container IDs and send the message to all of them. Only one of those must respond, but it is up to the user setting up the network to guarantee this.

strongly-typed commented 7 years ago

send the message to all of them

ACKed, but this includes a whole bunch of overhead, not only in looking up lists of containers, but also in bus traffic.

TheTh0r commented 7 years ago

Otherwise you need to use a central server process like i did. Then each container can tell the server which components are contained in it..

strongly-typed commented 7 years ago

central server process

Yes, it's called ROS master ;-)

With the multicast bit set any non-active component can listen to any communication on the network.

strongly-typed commented 7 years ago

OK, with multicast the following would be possible:

A component is only allowed to be primary/active component of maximum one container. Typically, this is the container running on the robot, either on microcontroller (STM32, Zynq bare metal) or embedded computer (RPi or Zynq with Linux). When Ethernet is the transport layer, hardware filtering can be implemented using perfect multicast group filtering for actions and imperfect hash based filtering for events. There is always additional software filtering used in the postman. But hardware filtering reduces the traffic which must be processed by the processor and is highly recommended.
A component can be secondary/passive to as many containers as the user wishes. Typically, these are instances for simulation and debugging, e.g. on non-microcontroller hardware. Filtering can be done in software. The current implementation of the Linux/Darwin port with libpcap does not include hardware filtering as the network device is put into promiscuous mode. So software filtering is the only option here. With software filtering it is possible to accept all messages for logging and debugging. It is even possible to accept any message and issue an ACK. The requesting component does not check if the called component answers to the REQ. It accepts any ACK.

Forwarding raw Ethernet frames over WiFi does not make sense. For that application the ZeroMQ transport layer is used. The current implementation is that the debug server listens to any CAN frame on the CAN bus and translates them to a ZeroMQ message (raw CAN frames over ZeroMQ). Reassembly of the fragmented CAN frames (when an XPCC message has more than eight data bytes) is left to the user application.

With raw Ethernet frames there are no fragmented messages as the MTU of Ethernet is larger than the XPCC message size limit. So XPCC messages from raw Ethernet frames can be picked up by the debug server and published as XPCC message by ZeroMQ to a user application running on any device. This ZeroMQ interface then differs from the current ZeroMQ interface.

TL;DR:

[ ] Addprimary/active or secondary/passive keyword to components. Messages are only addressed to the primary component. Any secondary component is not addressed directly but may use promiscuous mode to listen to any message. It is also allowed to send an ACK. It is then still up to the user to make sure not two instances of a component are sending ACKs at the same time.
[ ] Add a XPCC over ZeroMQ gateway for raw Ethernet frames like the XPCC over ZeroMQ gateway for CAN messages. The CAN connector reassembles fragmented CAN messages and publishes XPCC messages by ZeroMQ. The new gateway will receive XPCC messages from the Ethernet and publishes XPCC messages by ZeroMQ, too. So the interface to botcontrol does not change.
[ ] Add a CAN2Ethernet gateway to add existing microcontroller boards (e.g. beacon and anything2CAN).

dergraaf commented 7 years ago

A component can be secondary/passive to as many containers as the user wishes. Typically, these are instances for simulation and debugging, e.g. on non-microcontroller hardware.

That sounds strange to me. During testing with the simulated model, the simulation becomes the only active component. Therefore the whole network configuration changes. The current data model has no support for this and this is why you ran into problems.

Add primary/active or secondary/passive keyword to components.

I don't see how the active/passive components solve that problem.

TheTh0r commented 7 years ago

I have to agree with Fabian. The only solution is to accept only one active component per component id and otherwise reject the connection to the server (e.g. my tcp/ip branch)

roboterclubaachen / xpcc

[XPCC] Hidden ambiguity in components-container-association #207