Closed AlexisTM closed 4 years ago
You run essentially out of memory in your SHM pool. That's why the error message essentially says it's unable to allocate space for a new message.
The way I usually work around this is my increasing the shared memory pool size: https://github.com/eclipse/iceoryx/blob/master/iceoryx_posh/source/mepoo/mepoo_config.cpp#L44-L54
The last entry there can be set to 10/20, which gives you more chunks for large messages.
This is unfortunately far from being optimal, as these values are currently hardcoded. We have it on our list of features to make this more flexible and able to be set on startup.
With that configuration, I could sent around 4K images without any delay. I am a bit puzzled about the increasing delay though. @michael-poehnl any ideas?
@AlexisTM is NOT using the loaned messages API extension. So I would assume the increasing delay comes from the serialization which takes the longer the bigger the payload is. Our serialization in rmw_iceoryx is currently quite slow I guess :-(.
In rmw_iceoryx we take the queue size from the provided history qos. @Karsten1987 is there another source for the queue size? Should the "10" that @AlexisTM provides in the create_publisher call be propagated down to the rmw layer as history qos? Maybe the warning comes from another (built-in) subscriber that is using a queue size of 1000?
The warning we see comes from a history qos which is 1000 but the maximum constant in iceoryx is set to 256. So I'm wondering if this is your subscriber and we are not using the right parameter or if this is another subscriber.
If you want to have a queue size of 10 and can live with the fact that you will loose older chunks if the queue is overflowing, than this issue can be solved by increasing the number of chunks with 32 MB (e.g. to 20). If we than ensure that your desired queue size of 10 is used on iceoryx side (and not the 1000 is coming from your subscriber) it should crash no more. Currently we have a fail fast strategy. If your memory pool configuration is not sufficient to handle all the chunks that are blocked by queues and on user side
Having the memory pool configuration as a config file and not only as compile time setting is a feature that is quite on top of the stack.
From not using loaned messages, I expect delays from serialization. The typical delays with other middlewares are (18MBytes):
rmw_fastrtps_cpp
: 25msrmw_cyclonedds_cpp
: 18msrmw_iceoryx_cpp
with fixed sizes and loaned message: 0.1ms (Awesome!)rmw_iceoryx_cpp
with dynamic sizes and without loaned messages: > 1 secondThe reason it crashes is a lack of memory, which is coming from the required depth history which is only buffering because the listener doesn't receive the data fast enough (too high delay).
For the queue size of 1000, the subscriptions are using the depth: https://github.com/ros2/rmw_iceoryx/blob/a8c95d42de562ecab12f0173e9ea34a694521b66/rmw_iceoryx_cpp/src/rmw_subscription.cpp#L98
But there is no mention in the publisher side: https://github.com/ros2/rmw_iceoryx/blob/master/rmw_iceoryx_cpp/src/rmw_publisher.cpp
So the good message is that we are 100 times faster with loaning messages. The bad one is that our "hack a thing to support non memcopy-able messages" serialization is 100 times slower.
I'll check with @Karsten1987 if we can find there another solution by reusing things that are already available in ROS2.
We currently have no use for the queue size on publisher side. We plan to support their a history QoS in future, but this is no queue but more a cache for messages. Currently we only support caching one on subscriber side which corresponds to the latched topic in ROS1
Could you check if it is no more crashing when increasing the chunks of the 32 MB mempool? https://github.com/eclipse/iceoryx/blob/master/iceoryx_posh/source/mepoo/mepoo_config.cpp#L44-L54
NOTE: 0.1ms is the fastest we could go at for non-RT patched Linux.
@AlexisTM Could you share some code you used for benchmarking? I am trying to take a shot at this. I'd love to have a similar setup as yours to see how you'd produced these numbers to come up with comparable ones on my end.
I am out of office (and don't have the code with me). It basically was: subscriber and publisher with both a queue of 10, sending a struct as:
struct Big data {
uint8[33000000];
}
This was using the ROS2 API (no loaned messages)
@AlexisTM we modified our ROSCon demo a little bit to cope with loaned messages as well as "classic" method transport. In neither case we were able to reproduce the behavior of yours.
It would be great if you could give that demo a shot and post some of the results you get here.
To give you an idea on what we see on our machines:
When sending 4k images with 15 Hz with loaned messages:
[INFO] [image_transport_subscriber]: Received 75 messages
[INFO] [image_transport_subscriber]: Average round time 0.124256 milliseconds
When sending 4k images with 15 Hz using the classic approach:
[INFO] [image_transport_subscriber]: Received 104 messages
[INFO] [image_transport_subscriber]: Average round time 2.615024 milliseconds
Even when adding a string
field to the 4k fixed size messages to force serialization, we get round trip times of about 20 milliseconds. Can you try to reproduce this?
@AlexisTM I am going to close this issue because I am considering this problem being addressed. Please feel free to re-open this ticket if you have further questions about it.
I sent messages of 33MBytes but the publishers/subscribers were set with a queue size of 10 without loaned messages, and the publisher crashed.
This means: If we are not using the zero-copy capability in all nodes (loaned message methodology), the nodes will crash; I would, therefore, expect the global/local planner and default nodes mapping nodes to have problems when running over iceoryx.
Publisher was:
Subscriber was:
When starting, it says the following but I expect the queue size to be 10.
After a few messages, the node crashes due to lack of memory to be allocated.
This last error is (for me) due to delays in the subscriber side that makes the RouDi not being able to repurpose memory making the publisher dying while it is the subscriber's fault. Note that when I was doing tests on the raw Iceoryx; the typical delay was 50-150μs (18MB messages) but using rmw_iceoryx_cpp, I get a steadily increasing delay up to being few messages late (at 1Hz without any processing on a i9 machine).