roboterclubaachen / xpcc

DEPRECATED, use our successor library https://modm.io instead
Other
173 stars 39 forks source link

[XPCC] Hidden ambiguity in components-container-association #207

Open strongly-typed opened 7 years ago

strongly-typed commented 7 years ago

Dear all,

while working on XPCC over raw Ethernet frames I tripped over that the same component can be instantiated in multiple containers, e.g. here.

That is actually used in the RCA robot where <container name="drive big simulation"> and <container name="drive big"> instantiate nearly all the same components (<component name="driver" />, ...)

It is then by convention that these both containers should never ever be connected to the same network at the same time. While calling actions in driver may actually work (but you will get two ACKs, and two responses), it will totally mess up if driver publishes an event. You will then get two inconsistent events from two different sources.

For efficient Ethernet frame filtering in hardware I need to add the container in the destination MAC so that a pattern match filter (first five bytes of destination MAC address) can be used. But the lookup of a container (yeah, that's kind of static routing) is not unique if the same component can be instantiated in multiple containers in the same xml file.

Any suggestions how to resolve that issue?

TheTh0r commented 7 years ago

Actually I think there need to be two cases. 1st beeing an active component (which means sending acks) and there should always be only one instance of the same component. In the past when we used tipc having several people running the simulator in the same network lead to funny behavior because there were multiple instances of the same component running. In the current implementations it is not enforced that only one component exists (but I implemented this behavior in the now dropped tcp/ip communication backend by only accepting one connection to the server for each component) For debugging purposes a second variant needs to be available. Were you can receive all the actions, the real component would also receive, but not ack the messages. This way several instances could be able to log all the communication traffic (I also started implementing connecting to the server with the listen function) So to summarize:

There should only be one active version of each component. Currently this is not enforced by the software. But running more than one at the same time leads to undefined behavior.

The possibility to have passive versions of a component running in parallel to log all relevant communications would be a nice feature, but this is not required.

salkinium commented 7 years ago

cc @dergraaf @georgi-g

strongly-typed commented 7 years ago

would be a nice feature, but this is not required

The debug server on the robot is quite convenient. Although most of the relevant communication is done on the main board (strategy <-> driver <-> game components), loosing the full insight into the communication is a step back.

Using MAC addressing and an Ethernet switch makes this a bit more challenging. I probably have to look into multicast? The switch learns source and destination with the first Ethernet frame. From that point on no more broadcasts are sent.

This an action call with the corresponding ACK between an STM32 Nucleo F429I and my computer

            packet id 0x01
     source container 0x10
     source component 0x01
destination container 0x20
destination component 0x02
screen shot 2016-12-22 at 12 52 28

When communicating between two Nucleos, only the first frame is captured by my computer.

Maybe using a shared medium (10Base-T) can solve this problem. But further research is required to determine if the standard and common implementations still support such a shared medium. After looking into data sheets of Ethernet hub chipsets I would suggest that this might be possible.

strongly-typed commented 7 years ago

Actually, enabling multicast by setting the Group address bit in the MAC destination address seems to solve the switching issue.

dergraaf commented 7 years ago

It is then be convention that these both containers should never ever be connected to the same network at the same time.

With this convention it would be possible to send the message to all possible containers. E.g. have your look-up table return multiple container IDs and send the message to all of them. Only one of those must respond, but it is up to the user setting up the network to guarantee this.

strongly-typed commented 7 years ago

send the message to all of them

ACKed, but this includes a whole bunch of overhead, not only in looking up lists of containers, but also in bus traffic.

TheTh0r commented 7 years ago

Otherwise you need to use a central server process like i did. Then each container can tell the server which components are contained in it..

strongly-typed commented 7 years ago

central server process

Yes, it's called ROS master ;-)

With the multicast bit set any non-active component can listen to any communication on the network.

strongly-typed commented 7 years ago

OK, with multicast the following would be possible:

Forwarding raw Ethernet frames over WiFi does not make sense. For that application the ZeroMQ transport layer is used. The current implementation is that the debug server listens to any CAN frame on the CAN bus and translates them to a ZeroMQ message (raw CAN frames over ZeroMQ). Reassembly of the fragmented CAN frames (when an XPCC message has more than eight data bytes) is left to the user application.

With raw Ethernet frames there are no fragmented messages as the MTU of Ethernet is larger than the XPCC message size limit. So XPCC messages from raw Ethernet frames can be picked up by the debug server and published as XPCC message by ZeroMQ to a user application running on any device. This ZeroMQ interface then differs from the current ZeroMQ interface.

TL;DR:

dergraaf commented 7 years ago

A component can be secondary/passive to as many containers as the user wishes. Typically, these are instances for simulation and debugging, e.g. on non-microcontroller hardware.

That sounds strange to me. During testing with the simulated model, the simulation becomes the only active component. Therefore the whole network configuration changes. The current data model has no support for this and this is why you ran into problems.

Add primary/active or secondary/passive keyword to components.

I don't see how the active/passive components solve that problem.

TheTh0r commented 7 years ago

I have to agree with Fabian. The only solution is to accept only one active component per component id and otherwise reject the connection to the server (e.g. my tcp/ip branch)