Open ghost opened 3 years ago
Main link that is important is linuxfoundation.com/realtime.
See https://lwn.net/Articles/837019/ (the slides therein also show us how to plot where each process is executing vs suspending in time)
the chrt command can help but we can also set this with code
We can use the linux-lowlatency kernel to do a bit better (if we increase process priority/niceness) but overall what we want is to be able to use the deadline scheduler for this task (have not quite figured out how to do this yet, but see man sched and the realtime stuff).
We can play with all this scheduling and afinity etc (including the deadline scheduler) without the RT_PREEMPT kernel, its just that only the RT_PREEMPT kernel can gives us guarantees.
The delay in sending out serial data is almost certainly caused by latency in waiting for the read syscall to be processed or something like that. But scheduling our process with a higher priority or the deadline scheduler should help.
We have observed a phenomenon related to latency in the serial port interface on Linux.
Reproduce example 1:
if(consecutive_nones > 0)
on line 187 of omni_control.c. The threshold of zero here means that the robot will error out if it misses even a single command.omnid_control mobile_interface
(specific source we used is at hash 2de56cddde33e626940a63042fe1fd88e3627636 in the omnid_control repo). This node sends twist commands to the mobile robot controller.Reproduce example 2:
After a number of tests with different code (all available in the strong/uart-latency branch in the relevant repos) we have concluded that this is likely to be due to a delay in the return of linux system calls. The robot appears to respond normally, but the host node doesn't send response fast enough (this has been confirmed on a scope). On the frame in which the failure occurs, the protocol_read_block call in the node takes longer to respond.
Things we tried:
Potential paths to rectify this include:
consecutive_nones
allowed on the omni control board. This approach allows us to miss more than one control frame, which means there will be jitter that gets through to the robot's motion. We can limit the impact of this by setting the threshold to a reasonable value like 1/20s (10 counts at a loop speed of 200 Hz).Relevant links: https://wiki.linuxfoundation.org/realtime/documentation/technical_basics/preemption_models https://nicolovaligi.com/concurrency-and-parallelism-in-ros1-and-ros2-linux-kernel-tools.html https://lwn.net/Articles/743740/ http://wiki.ros.org/realtime_tools https://askubuntu.com/questions/656771/process-niceness-vs-priority