Open SteveMacenski opened 3 years ago
@SteveMacenski
I know this does not answer your request, but how about using LoanedMessage
for intra-process communication? this cannot cover the requirement because of the performance or something else? just curious.
There are times we loop around messages to publish a bunch of them or need to store them longer term. Not to say that things couldn't be massaged into a way that could use loaned messages, but just not how Nav2 is structured today. Plus I don't think many of the RMWs actually implement this to make use properly and we need to be able to support all major configurations at least "reasonably" well -- though that's a floating definition. Though if someone came along and changed all of our use to loan messages, I certainly wouldn't stop them!
While we could go through all of Nav2 and replace with that, I think a better solution is making IPC actually compatible with the QoS' that ROS2 exposes. It seems like a bit of an oversight that IPC doesn't work with most QoS' settings and I'm not sure what the technical limitation is there.
thanks for sharing thoughts. the problem is that QoS is provided by RMW implementation. since rclcpp intra-process communication is integrated only in rclcpp, it needs to be re-implemented into rclcpp. there is a discussion related to this, https://github.com/ros2/rclcpp/issues/1750 .
If all of ROS2 styling moved to getting loaned messages, I'd be happy to migrate there in Nav2, but while its only a super niche use-case that only 1 (maybe 2?) RMWs leverage, it doesn't seem like a viable solution for the time being. I wouldn't actually argue if in the future we use our publishers to getMessage()
to just throw back to me a message object to populate and use. But that's a big change that would take some non-trivial restructuring.
We're also seeing that when we do enable IPC, our CPU of nav2 nearly doubles but the RAM drops by about 70%. I love that memory droppage, but the CPU increase is startling. Unrelated, but something else we're characterizing and trying to find our way around as we're attempting to make a composition solution for Nav2. We're hitting alot of blockages: IPC QoS, launch system with lifecycle components, CPU spikes, to name the big ones
Note, please use "intra-process communication" instead of IPC, because IPC is very overloaded. I'm gonna edit the OP to use the full term instead.
The only reason this QoS isn't supported is as @fujitatomoya said, because it's complicated to re-implement in rclcpp. It's something we're working on, but it's a big effort.
Is this a place where you need transient-local and intra-process comm? Or are you just trying to turn intra-process comm on everywhere and it's annoying that a few places don't just work? What I mean is, can you just disable intra-process comm for this topic and still get a big improvement? You can enable it globally for the node but then explicitly disable it elsewhere, like on topics with transient-local durability.
@gezp please follow up to the questions.
I'll let Zhenpeng give you an account of all of the QoS profiles we're using, but transient-local is definitely one of them for things like costmaps and other similar safety information.
are you just trying to turn intra-process comm on everywhere and it's annoying that a few places don't just work
We're trying to create a composition-based navigation package for embedded systems users that need to make optimizations due to harsh resource constraints. So this is using the software generally used, but manually composed to be able to leverage intra-process communication and shared resources (rather than brought up with launch as separate processes).
When that exception is thrown that intraprocess comms isn't available for a certain QoS, will it continue to operate properly with interprocess communication instead if we just catch that exception and continue on? Else, I don't know how we would "disable intra-process comm for this topic and still get a big improvement" since this is a node-based setting.
Relatedly:
Zhenpeng shared with me the following results from his tests. The data was collected using Python's OS library on the CPU/memory used by nav2 while:
use_intra_process_comms(true)
As you can see, the normal bringup and manual composition have roughly in line CPU numbers but ~70% reduction in memory. But when we flip on intra-process communication, the CPU spikes by ~50%. Memory stays consistent.
Hmm, I'm not sure why that (increased cpu usage) is, but it could be related to the double delivery of data which occurs with intra-process communication. That's another thing we're working on and another one of the reasons we're not turning it on by default yet.
Got it -- that's another problem but just to refocus the ticket back to the core request for intraprocess to handle the range of QoS profiles (if ROS 2 isn't going to fully support QoS settings, perhaps that's an argument that ROS 2 shouldn't be exposing the full range of QoS settings in the API and have simpler base profiles only).
The use of it is a separate topic and there's still some work to be done on our side before I'd even say its not something we're doing wrong potentially.
(if ROS 2 isn't going to fully support QoS settings, perhaps that's an argument that ROS 2 shouldn't be exposing the full range of QoS settings in the API and have simpler base profiles only).
Eh, in general I agree with limited QoS features exposed, but in this case I don't the issue is too many QoS features (durability is needed to have feature parity with ROS 1), but the issue here is that intra-process comms are still kind of under development. The core APIs (not the ones users use) are still in the rclcpp::experimental
namespace, and though you can enable it via a Node option, it isn't on by default. It's there to help latency mostly in cases that make sense. One day we want to support all QoS and turn it on everywhere by default, but we're not there yet.
Certainly can understand that :+1:, this ticket is now a placeholder for noting this is missing and would be of use to eventually have gotten to. I haven't tracked down the intraprocess parts of the code in rclcpp to gauge the relative difficulty. Though I think Cyclone has and Fast-DDS is planning (?) on a shared memory intra-process implementation internally, so it may be less and less important that this feature is exposed in this way anyhow and simply leverage theirs.
Yeah I don't mean remove durability in particular -- more of a general comment that I've heard and to some degree feel myself that the QoS could be simplified at the ROS 2 level. Bringing that complexity up to users of ROS from DDS is both powerful and alot of mental overhead for people not looking to optimize the code at that point. This is a topic for another time and place (and I myself am not completely sold on it). I think there could be 5-6 profiles that are used (like Sensor data QoS) and only require raw DDS-QoS-speak for later optimizations that need something different abstracted away from the initial development phase. But that kind of already exists so its not immediately clear to me what I would even want changed tangibly. Perhaps just a change in best practices and a few more pre-configured options?
This issue has been mentioned on ROS Discourse. There might be relevant details there:
Question about QoS for Intra-process comms: I understand that not all QoS policies have been implemented, but QoS policies are typically applied per-topic where intra-process is per node. Why wouldn't we specify the use of Intra-process on the topic instead of the whole node therefore allowing full QoS support on some topics and intra-process on others (that don't yet support the QoS needed)?
Intra-process can both be specified per node and per topic. The node value is a default for every topic, but you can override it when creating a publisher/subscription.
I guess I missed that from the demo. I'll look into that. Thank you.
For example, here is the setting for the subscription:
And the publisher:
And it is a tri-state option:
I saw some examples in the tests using this. Still having problems. Will post in ROS answers.
Update: here's the one https://answers.ros.org/question/398651/ros2-per-topic-intra-process-communications-setup/
Bug report
Required Info:
Feature description
We'd like to make Nav2 use composition but a bunch of the QoS settings aren't supported by intra-process communication and cause exceptions to by thrown https://github.com/ros2/rclcpp/blob/master/rclcpp/include/rclcpp/publisher.hpp#L199.
We'd like intra-process communication support for all communication QoS settings