Open JanStaschulat opened 3 years ago
Regarding the stack size, usually calling to micro-ROS functions use "big" stack:
These values are completely experimental and usually depend on the functionality used. The point is that every time I see an inexplicable crash in an embedded application, it usually is the stack.
This stack is only the main thread? The other threads share this amount?
I guess. How do I know? Do you think that 65000 (aka 65kB) is sufficient for all - or should I define an extra STACKSIZE for these threads?
If I configure a new thread like this: https://github.com/micro-ROS/nuttx_apps/blob/0746a008311494f82e7e1b2abae999f843b9400c/examples/uros_rbs/main_rbs.c#L138 the stack-size is exclusive for the thread?
The worker thread does not have any local data. However the thread gets as parameter this struct (with 4 pointers) https://github.com/micro-ROS/rclc/blob/7a5d0d254f4dbf744b04f46a14fd05de061bbeb3/rclc/include/rclc/executor.h#L56
The worker thread https://github.com/micro-ROS/rclc/blob/7a5d0d254f4dbf744b04f46a14fd05de061bbeb3/rclc/src/rclc/executor.c#L1442 only calls the callback of the subscription (which is defined by the user) - thats all: https://github.com/micro-ROS/rclc/blob/7a5d0d254f4dbf744b04f46a14fd05de061bbeb3/rclc/src/rclc/executor.c#L1466
I wonder, what stacksize for such a thread should be - what would you suggest?
What is the maximum total STACKSIZE for all threads? How much memory is available for that at the Olimex board?
I'm not sure about how stack size is set for threads in Nuttx, AFAIK it should follow strictly the POSIX Thread API.
These samples that I provide are for threads that execute the whole micro-ROS application.
The handling of the available memory depends on the RTOS, for example, FreeRTOS let you create static memory blocks for thread stacks, and Zephyr (by default) use heap for allocating dynamically the thread stack. So I guess that the available memory depends on the heap/bss/data sections defined in the linker script because I guess that the stack section (if exists) will be used to startup code and RTOS initialization.
Are you sure that only one thread is accessing the middleware? In some tests that we did time ago, multithreaded access to the XRCE middleware breaks it really easy because of buffer corruption.
"Are you sure that only one thread is accessing the middleware? In some tests that we did time ago, multithreaded access to the XRCE middleware breaks it really easy because of buffer corruption."
Yes. I designed the multi-threaded executor in such a way, that only one thread accesses all calls to XRCE middleware. However, a guard condition is signalled from the worker thread - but that should be okay, or not?
Please explain how the worker thread uses the guard condition.
The reason for the guard condition is to respond as fast as possible to incoming messages:
rcl_wait
checks for new data and its corresponding worker thread is busy, then the executor thread will grab CPU 100% - as rcl_wait
always come back immediately and the message will not be consumed with rcl_take
.rcl_wait
call will block until the timeout. (one might argue that is still okay - but i want to process the next message - as soon as the worker thread is ready again) rcl_wait
if the worker thread is ready again. For this reason, the worker thread has a guard_condition, which it will signal when the worker thread is ready again, aka after processing the subscription callback. rcl_wait
and in the next iteration of the executor loop, the wait-set is created with the corresponding subscription: In the main-thread
at initialization:
in executor loop
In worker thread:
rcl_trigger_guard_condition
rcl_trigger_guard_condition
is guarded with a mutex, so multiple worker threads cannot call this funtion at the same time Ok, I'm trying to understand this...
Some thoughts:
rcl_trigger_guard_condition
at middleware level is not thread-safe: link. But I guess that if the worker is the only one in charge of triggering it, it is ok.rmw_wait()
and the rcl_wait()
call will only return before timeout if an XRCE data message arrives (Subscription, Request or Reply). Once the session has been ran, we check the guard conditions, check here. That means that if you are running a rcl_wait()
for N ms, it will wait N ms despite a guard condition is triggered in between.Let me know what do you think about this approach or if this interferes too much with your implementation. Maybe we can add some kind of concept of "guard condition" to the XRCE middleware in order to abort the session wait...
Last thing regarding this
I tested the multi-threaded exectutor under Linux (Ubuntu 18.04, local Foxy installation), which works.
Is possible for you to test the executor under Linux but using the XRCE-DDS middleware? Like we do with micro-ROS demos. This way we should be able to determine if this is a middleware problem and debug and fix it in such that case.
rcl_publish
which is called from worker thread. Limitation: micro-ros is single threaded. Now all functions: rcl_wait, rcl_take, rcl_publish
are all guarded with the lock. Drawback: trade-off needs to be found between timeout for rcl_wait and throughput of worker_threads (which might publish data)
then I could run the ping-pong example successfully.
Jan, which one of these points makes the application works? I would like to know it because if it is a stack limitation it would be "ok".
But if we are having concurrency issues I would like to investigate a bit about making the library multithreaded because in this use-case it should be working theoretically with the current approach.
Definitly, this stacksize adjustment make it run. The configured stacksize is the total stacksize of the entire application (with two pthreads). The stacksizes of the threads are not configured - so I assume that they are using the stacksize of their spawning application.
The lock in the worker-thread around the the execution of the user-callback, which might call rcl_publish
, and in the executor-thread (for rcl_wait and rcl_take) are necessary because micro-ros is single-threaded. For a multi-threaded micro-ros implementation, this lock would not be necessary. The potential waiting time when publishing messages in the user-callback or calling rcl_wait would be removed.
To summarize, this version of a multi-threaded rclc-executor works with the single-threaded micro-ros library.
It was designed to demonstrate budget-based scheduling with NuttX, but it can be used to assign priorities on Linux/FreeRTOS/Zephyr as well - assuming creation of pthreads, and assignment of priorities (sched_param) are supported.
I've followed this on the side since Jan asked me about it earlier. The fact that we needed 65k stack, and now even 69k stack, has puzzled me for a long time and I would love to learn more about what this is for. It doesn't seem to match the memory requirement benchmarks for micro-xrce-dds, but I also don't see another big consumer of memory in this app.
@iluetkeb we have profiled the memory consumption of micro-ROS in FreeRTOS where you have complete control over dynamic memory allocations and stack of each task, as you can see here results are different from what we can see in Nuttx.
@pablogs9 The memory consumption measurements show publisher and subscriber in isolation. But what would be the memory for an application with two publishers and two subscribers? Can I just add these values:
Really?
The Olimex board STM32-E407 has only 196kB RAM.
As stated in the document, this high memory consumption is related to middleware buffer configuration and the RMW history. Right now you can tune it to your topic size (by default it is 512 B and both rmw and middleware histories are 4), so for example for subscriptions, by default, you will need 51244 = 8192 B of static memory for each one.
You can tune these values for decreasing this.
Also, we have planned a refactor of the micro-ROS RMW where the subscription buffer will be shared between every subscription.
By now, users that have wanted to tune the memory consumption had found no problem, in fact, we have ports for Arduino Zero where you only have 32 kB of SRAM.
@pablogs9 I did not understand this sentence: "(by default it is 512 B and both rmw and middleware histories are 4), so for example for subscriptions, by default, you will need 51244 = 8192 B of static memory for each one" default topic size 512 B + history 4 = 516 Or multiply by 4 => 512*4 = 2048? I am lost.
After enabling sporadic scheduling with kernel variable CONFIG_SCHED_SPORADIC
, the application uros_rbs does not run any more. After some while the application just hangs.
I increased the STACKSIZE. This is my result:
CONFIG_UROS_RBS_EXAMPLE_STACKSIZE ?= 68625
# 65000 => uros_rbs hangs after some while
# 68480 => uros_rbs hangs after some while
# 68600 => uros_rbs hangs after some while (after sending messages)
# 68625 => ping-pong example works!
# 68650 => nsh:uros_rbs: command not found
# 68700 => nsh:uros_rbs: command not found
# 68800 => nsh:uros_rbs: command not found
# 69000 => nsh:uros_rbs: command not found
So until 68600B the application hangs and from 68650 the application is not available from nsh shell:
nsh>uros_rbs
nsh: uros_rbs: command not found
nsh>help
help usage: help [-v] [<cmd>]
[ cd df help mb nslookup sh umount
? cp dmesg hexdump mkdir ps sleep unset
addroute cmp echo ifconfig mkfifo pwd test usleep
arp dirname exec ifdown mh rm telnetd xd
basename date exit ifup mount rmdir time
break dd false kill mv route true
cat delroute free ls mw set uname
Builtin Apps:
date tcpecho uros_rbs ping cu
nsh>
I guess, because the sporadic scheduling is enabled, a few more functions are in NuttX OS library. Even though, I am not calling any sporadic scheduling functions, this has an impact on the STACKSIZE of the application. Strange.
Even though, with luck, I found a configuration that just worked. However, very shaky! How could I reduce the number of bytes for a subscriber/publisher or other configuration variables to reduce the amount of memory for the micro-ros stack?
@pablogs9 I did not understand this sentence: "(by default it is 512 B and both rmw and middleware histories are 4), so for example for subscriptions, by default, you will need 51244 = 8192 B of static memory for each one" default topic size 512 B + history 4 = 516 Or multiply by 4 => 512*4 = 2048? I am lost.
Let assume one subscription in reliable mode: the middleware has a buffer with 4 slots of 512 B, in total: 2048 B. The RMW layers use that buffer for receiving topics of the subscription, as we want the RMW to store the received data between the rmw_wait
and the rmw_take
we need some kind of buffering here. The maximum size of the received topic will be 2048 B (the size of the middleware buffer). So if we want to have up to 4 received topics in the RMW, we need another buffer of 4*2048= 8192 B.
So in total, with the default configuration:
If you want to tune this static memory:
This said, I don't know how Nuttx is handling stack, the only thing that we have measured is the maximum stack consumption of a task that runs micro-ROS in FreeRTOS. All the details, procedures, and results are explained carefully in the memory profiling article. In this case the stack is about 10 kB, which is aprox the value that we are using in apps that use FreeRTOS.
In Nuttx I'm not aware of the behavior of the memory handling since for me it seems more like a normal OS than an RTOS:
STACKSIZE
? If so, this will clarify a lot the memory handling of Nuttx. Something is not right here. Jan's app is about as stripped down as it gets, there is almost nothing but communication setup and a bit of execution in there. How can this take so much memory? For the record, we ran the Kobuki demo in less stack than this (but it also took more stack than we could explain at the time).
The increase from 65kb to 68,6kb stacksize was also due to the worker-threads and activating sporadic scheduling:
I had to add another thread in the demonstrator application to create a 100% CPU utilization. And now with three threads the application does not run any more:
uros_rbs
does not start any more in the nsh shell. I updated https://github.com/micro-ROS/nuttx_apps/blob/8f2e0f6bfccc0d249deb730a36f17da1cb49b006/examples/uros_rbs/uros_rbs-colcon.meta#L11 but at runtime, the application just freezes again. Is the name of the file correct?
Another reason could be, that the wait_set
is created only for those subscriptions, which worker thread is ready. If both worker threads are busy, then the wait_set
is empty. A requirement for rcl_wait
is, that the wait_set
must at least contain one valid handle (for ROS 2 implementation). Is this also a requirement for micro-ros xrcedds implementation?
I'm not sure about the behavior of RCL but in our RMW you can run the XRCE session (in the rwm_wait
) without any valid handle. This is because when running an XRCE session other internal XRCE-related stuff like ack-nack and heartbeats are handled.
Are you able to debug onboard using a JTAG probe and detect where does the application freeze?
Documentation of rcl_wait regarding empty wait_set:
"Passing a wait set with no wait-able items in it will fail." https://github.com/ros2/rcl/blob/4740c82864518a331ae98799f25b2ba085b22473/rcl/include/rcl/wait.h#L434
I have not setup JTAG debugging on the board yet.
It will fail at RCL level: https://github.com/micro-ROS/rcl/blob/8eddc13db38bdecdd3089b8c96d13f0df3f5b35d/rcl/src/rcl/wait.c#L538
At least it does not crash and comes "only back with an error message" which I could ignore.
Hi,
@jamoralp @pablogs9 @ralph-lange
I am running into problems for an application with multiple threads for NuttX and Olimex board. You know, that I try to get the multi-threaded executor for NuttX running. What I have done so far:
When I try to run it on Olimex board, the threads start, but there is no progress in the main thread (aka executor). The application uses two subscriptions and two publishers and one executor. So I don't need to change anything in the micro-ros configuration, right? https://github.com/micro-ROS/nuttx_apps/blob/0746a008311494f82e7e1b2abae999f843b9400c/examples/uros_rbs/main_rbs.c#L216
The processing just stops after calling rcl_wait https://github.com/micro-ROS/rclc/blob/7a5d0d254f4dbf744b04f46a14fd05de061bbeb3/rclc/src/rclc/executor.c#L1559
However, rcl_wait might not be the problem - maybe something goes wrong with the other threads and then everything stops. I configured also the priorities:
I also noticed that sometimes the green light on the Olimex board starts blinking. after that not output is seen in the nsh shell (via screen terminal). What does that mean? Something really went wrong?
I wrote a simple program a main thread and two worker threads, which seems to works fine (without micro-ros functions used).
Is there anything regarding STACKSIZE I have to consider? Currently it is set to 65000 in the Makefile
When spawning threads (pthread_create), then no stack size is configured. What is the default stack size. Is it maybe too small, too large?
rclc executor: https://github.com/micro-ROS/rclc/tree/feature/rbs-nuttx
application on olimex: https://github.com/micro-ROS/nuttx_apps/tree/feature/foxy_rbs_executor_demo/examples/uros_rbs