Open rfairley opened 5 years ago
I like the idea of a client-server-like model, and I think that could be a very nice and robust approach. One thing I really like about it is that it would probably help with debugging since we could implement an accompanying command-line interface fairly easily. I also like the flexibility and and modularity that comes with such an implementation. I'm imagining that we can have some sort of generic event processor with a bunch of different cases for each command/query that we define. In this case, adding a new command /query would require the following:
Thinking in UART terms, with this scheme, our packet would probably no longer be a fixed size. Instead what we might do is use a fixed-size start of packet sequence which would contain at least (1) a sanity check header (e.g. 0xFDFD), and (2) a byte or two that contain the code for the command/query. The MCU can always initiate a reception for this fixed-size start of packet sequence, and then after receiving and parsing it, it can initiate a reception for the appropriate packet since it now knows the type (and the corresponding size, presumably). In ethernet terms, I am guessing we would probably just receive the whole packet and then look at the command/query code to know how to interpret it, since I don't think we really have a choice about how many bytes we want to initiate a reception for. Correct me if I'm wrong though.
I like the idea of caching the sensor readings. It is actually a very good idea to run the sensor-reading threads on a time-triggered basis, as this ensures that sensor data will ALWAYS be fresh to within a deterministic number of milliseconds. You are right that this does not really solve the long round-trip time issue though, but it is still a good improvement to make.
As for data sharing, we already have to do this, and I think the current mechanism of doing it is not terrible. Recall that currently, all sensor threads write their data into the same queue, which only has one reader (the PC TX thread). This PC TX thread updates the one and only "cache" itself. I'm fine with changing this anyone has a better solution for intertask communication & cache updating though.
I'm imagining that we can have some sort of generic event processor with a bunch of different cases for each command/query that we define.
Agree on having some event processing module. One case could be that a command may force the sensors to do a new read and wait for the read to complete (like what we have now), and another case just simply reads from the cache that had been updating in the background. As you mentioned, this would help make adding debug commands a lot easier.
Thinking in UART terms, with this scheme, our packet would probably no longer be a fixed size. Instead what we might do is use a fixed-size start of packet sequence which would contain at least (1) a sanity check header (e.g. 0xFDFD), and (2) a byte or two that contain the code for the command/query. The MCU can always initiate a reception for this fixed-size start of packet sequence, and then after receiving and parsing it, it can initiate a reception for the appropriate packet since it now knows the type (and the corresponding size, presumably).
This could work for UART - splitting command reception into two receive operations. Thinking about it now it'd be difficult to try and have only one receive operation of variable length - some sort of handshaking is required.
In ethernet terms, I am guessing we would probably just receive the whole packet and then look at the command/query code to know how to interpret it, since I don't think we really have a choice about how many bytes we want to initiate a reception for.
Yes - lwIP handles this by presenting data structures called struct pbuf
s which are queued by lwIP as data is received. Included are the length of itself, the packet length (as there can be multiple pbufs per packet), and a pointer to the data (https://github.com/utra-robosoccer/soccer-embedded/blob/rfairley-lwip-rtos-config/Development/Ethernet/lwip-rtos-config/lwip-rtos-config/Middlewares/Third_Party/LwIP/src/include/lwip/pbuf.h#L142). We can choose how large the pbufs are from CubeMX, right now it is set to 1524 bytes. There doesn't seem to be any guarantee on the number of pbufs used to contain a packet; lwIP will make that choice. We can be certain however that if we receive a UDP packet, that all the command data is included in that packet. So we would just need a routine to read all pbufs associated with one packet in order to receive a command. We'd also need to package things up into pbuf(s) before sending data to the PC - which I don't think is too complex.
As for data sharing, we already have to do this, and I think the current mechanism of doing it is not terrible. Recall that currently, all sensor threads write their data into the same queue, which only has one reader (the PC TX thread). This PC TX thread updates the one and only "cache" itself.
I like the use of queues in the current implementation for sending commands, it's convenient having the "executor" task only ever read from it when it's ready. On the data sharing, I was concerned about complexity from having multiple tasks (e.g. UART tasks from motors) trying to access a mutex to write directly to the cache. However, having tasks sending data to the same queue seems to solve this problem already, and we could just have a dedicated "cache writer" task reading from that queue and updating the cache. The Tx thread could then read from that cache.
Here are key points from the discussion Robert and I had at yesterday's meeting. First of all, we addressed the areas we'd like to improve.
While addressing these 3 areas, an updated design for our system was conceived.
The new design consists of two shared data structures, each of which have only 1 reader and 1 writer:
There are also 14 threads (note: priority = 6 is the highest priority while priority = 0 is the lowest priority:
The new PC interface will support 4 commands/queries (note: names are not finalized by any means):
As a reminder, our current design all begins with the reception of a goal position packet. This triggers the movement of motors and the acquisition of sensor data, which is then sent back to the PC. In the current design, we are idle doing nothing most of the time.
The way the new design fixes this issue involves the 8 priority 2 threads. To begin with, the IMU and foot pressure sensors are both run on a time-triggered basis, meaning that they run every 2 ms and fetch new sensor data. When they have acquired fresh data, they send it to the cache writer queue, causing the cache writer to be woken up and the cache to be updated. For the motors, there is a new thread called the "motor read command generator" which generates read commands on a time-triggered basis. These read commands are added to the read command queues for each UART handler. Similarly, when new motor position data is available, the UART handlers send the data to the cache writer queue, causing the cache to be updated. Since all the sensor data is fetched in deterministic 2 ms periods, we have 0 idle time when sensor data is requested by the PC since we can just read from the cache.
As mentioned previously, one of the key ways to improve flexibility is to try to decouple the different parts of the application; this goes hand-in-hand with the separation of concerns design principle. In the new design, a key place to apply this is PC reception. Basically, the PC RX thread logic will remain the same forever: it will initiate an asynchronous reception from the PC, and upon receiving data, it will invoke the RX event handler. The RX event handler will work as follows:
We have broken down how the RX event handler will implement the aforementioned PC interface commands:
Just another thought on decoupling - it'd be nice to have the application modular enough so that only the PC RX
and TX
task code changes when switching physical link comm protocol (e.g. switch between Ethernet and UART). Although, the packet parsing would need to change which means RX Event Handler
changes. An #if macro could work with that cleanly though (e.g. #if (PHY_PROTOCOL==UART) parseUART(); #elif(PHY_PROTOCOL==ETH) parseEth(); #endif
). Thinking about https://github.com/utra-robosoccer/soccer-embedded/projects/4#card-11027173.
I made some diagrams to try to help visualize the new flow. I did not focus much on data structures, so that could be improved. Edit: Added a bit more information about data structures. Not finalized at all though.
Source files: August-6-2018-Design.pptx August-6-2018-Design_Full.pptx
Yo, you awake?
Regards, Jason Wang, Computer Eng. Student Tel: 647-879-4660 | Skype: Jaw.vuwij
On Aug 6, 2018, at 5:00 AM, Tyler Gamvrelis notifications@github.com wrote:
I made some diagrams to try to help visualize the new flow. I did not focus much on data structures, so that could be improved.
Altogether
Source files: August-6-2018-Design_Full.pptx August-6-2018-Design.pptx
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
I am also imagining that there are a few settings on the robot that we might be able to store in non-volatile memory and then change through a PC interface, that way we wouldn't need to re-program it to change minor settings. Just something to keep in the back of our minds.
I like the diagram - portrays accurately what was in mind during discussion and shows which processes block which. We can check against the diagram while implementing to make sure we accounted for all the data structures and dependencies.
I am also imagining that there are a few settings on the robot that we might be able to store in non-volatile memory and then change through a PC interface
This sounds interesting to learn about the different memory regions while doing this (we'd need to go into the *_FLASH.ld
script?), and would decrease the load/flash time.
Below the UDP layer, there are checksums used. @rfairley needs to verify whether a packet will be dropped automatically by library if the checksum cannot be verified
Following up to this, I did some reading of the code (best seen from Development/Ethernet/lwip-rtos-config
):
CubeMX+lwIP allows you to configure the checksum operations on UDP/TCP/ICMP/IP packets from two places: the ethernet driver of the board (ETH), and the lightweight IP stack (LWIP).
The setting TX IP Header Checksum Computation
can be configured as By hardware
or By software
from CubeMX. This translates to setting a (heth->Init).ChecksumMode
member
of the Ethernet handle to ETH_CHECKSUM_BY_HARDWARE
or to ETH_CHECKSUM_BY_SOFTWARE
.
If ChecksumMode
is set to ETH_CHECKSUM_BY_HARDWARE
, then a dmatxdesc.Status
member
of the DMA Tx descriptor is set to ETH_DMATXDESC_CHECKSUMTCPUDPICMPFULL
, which
sets the TDES0 register. This is how the hardware is told to perform checksum
operations for the TCP, UDP, and ICMP protocols. A macinit.ChecksumOffload
member
is set for the IPV4 checksum calculations being offloaded to DMA. A presentation
by ST (slide 4) gives some insight as to what the inputs of the checksum calculation
in hardware are (IPV4 header, and UDP/TCP/ICMP data payload).
I'm not sure when ETH_DMATXDESC_CHECKSUMIPV4HEADER
would need to be used - it
is currently not used in the code.
As far as I can tell, the hardware is being given the necessary instructions to calculate and check checksums on outgoing and incoming UDP packets. The hardware should drop the packet in case of a bad checksum, based on how UDP operates. It would be difficult to verify this through a test, as these operations are performed within the hardware.
The lwIP configuration in CubeMX gives a choice between checksum in hardware
or software. If CHECKSUM_BY_HARDWARE
is Enabled
, then the CHECKSUM_GEN_*
and CHECKSUM_CHECK_.*
settings must be Disabled
. These definitions are
made in lwipopts.h
.
I could not find any references in the code to CHECKSUM_BY_HARDWARE
, so the
hardware checksum can be configured only from the ETH config. I thought it
may be internal to CubeMX, e.g. to stop the code from generating if the
Enabled
/Disabled
condition is violated, but CubeMX still allowed generating
the code with the "by hardware" and "by software" settings both enabled. So,
CHECKSUM_BY_HARDWARE
configured from LWIP appears to have no effect.
As an example, setting CHECKSUM_CHECK_UDP
would enable software checking by lwIP
of UDP packet checksums. By looking at the code in
Middleware/Third_Party/LwIP/src/core/udp.c
one can see that pbufs
are
freed before reaching the user application if there is a checksum error (see
line 330). So, this implements the packet dropping functionality that is expected
in the case of corrupted UDP packets. We can verify this by editing the
checksum protocol to a different one than what UDP requires (RFC 768) and observing
that the packets are dropped.
The main points and tradeoffs concerning hardware vs. software checksums are as follows:
Nice investigation, I think checksum in hardware is the way to go. Good to know pbufs are dropped upon packet failure (did you mean udp.c by the way?).
I noticed that the slide you referenced from ST's presentation says there is a datagram CRC. I assume that while the checksum certifies the integrity of the payload, the CRC certifies the integrity of the entire datagram. Do you know if the computation for that CRC is happening in hardware or software right now?
Whoops, yes meant .c, thanks for pointing that out.
The CRC computation is happening in the hardware I believe. There is not much else I could find from the ST site, but reading about the standard (IEEE 802.3) points towards the CRC being done at the link layer and controlled by the MAC hardware (https://en.m.wikipedia.org/wiki/Ethernet_frame#Structure). From my understanding the CRC is done for each Ethernet frame and therefore the whole datagram.
This references the card https://github.com/utra-robosoccer/soccer-embedded/projects/4#card-11027147. Please correct or suggest anything in the comments.
Current
Right now, the interface between the PC and MCU (see
/Robot/Src/freertos.c
) uses UART to communicate. To briefly summarize the process: the MCU waits for data sent by the PC, sends commands to motors and sensors, and then returns sensor data to the PC. This flow enables feedback necessary for controlling the robot. We would like to build on this and make some improvements in the following areas:Current model
Our current model for PC/MCU interactions is as follows (please correct if wrong/missing detail):
defaultTask
)(steps 2,3,4 happen repeatedly and forever)
Advantages
Disadvantages:
Idea
I recently had a read over
/Robot/Src/freertos.c
and had an idea for how we could remodel the communication interface. Would be great to discuss and hear everyone's thoughts.The main ideas behind the proposed model below are:
Proposed model
As follows:
(steps 3 and 4 happen repeatedly, forever, and concurrently)
Advantages:
controller
" task, which is busy, so the MCU uses an "api
" task to return "busy" to the PC)Disadvantages:
To design the proposed model, we need to:
To analyze the performance improvement:
Development
)