Closed fujitatomoya closed 2 months ago
CC: @MiguelCompany @clalancette @eboasson
And the answer is
NO
for both cases. currently ROS 2 does not support any cross-distro or cross-vendor communication officially. (which is my understanding, if i am mistaken, let me know) some of these cases are implemented in test code, but that does not mean that we support these compatibility.
So I guess it depends on what we mean by "support". We know (and test) that we can publish topics between all 3 of our in-core DDS vendors (Fast-DDS, Connext, and CycloneDDS). I also happened to test today that we can call services between Fast-DDS and Connext (but not with CycloneDDS) (this also implies that actions should work between Fast-DDS and Connext, though I did not test that).
So with the two exceptions of services Fast-DDS <-> CycloneDDS and services Connext <-> CycloneDDS, all of the rest of it should work. The question is whether we want to officially call that supported or not.
FYI, as of ROS 2 Galactic sending a service request using CycloneDDS to a Fast-DDS application results in a crash during deserialization of the packet. I think that @MiguelCompany is working on making it reject the data, rather than crash.
IMO unless we guarantee that everything works between all Tier 1 middlewares, we should discourage users to not even try to mix middlewares. The same applies for communication between distributions. For example, by default we could print very loud warnings if someone is mixing stuff.
An environment variable could be set by advanced users who are explicitly looking for cross-communication.
Trying to use different versions of ROS or different middlewares (by mistake!) are some of the most common causes of the issues reported by Create 3 (and Turtlebot 4) users.
So for the service interoperability, there is another issue open. I've understood from @gavanderhoorn (who BTW has rightfully been bugging me about it) that it really is necessary because of some devices only supporting middleware 1 and the other only middleware 2, and yet they have to be combined into a single system. The only way forward is therefore to agree on a portable way of doing service invocations.
I promised him that I'd try to come up with a proposal that might be agreeable to both @MiguelCompany and myself — the idea being that open source tier 1 parties ought to band together, and not let the commercial people try to bully us into something we don't want.
To me, the only sensible path forward is to do something that builds on top of the (spec'd) DDS API and respects the layering at the DDS level. Cyclone's RMW implementation currently meets this requirement, the only trickery it requires is encapsulating the ROS service payload in a slightly larger message. I believe the separate service type support exists precisely to make this possible — and doing it in a cleaner way than what the Cyclone RMW layer does.
The downside on the higher levels of the stack of doing the layering properly is (or might be) that the transformation between application data and DDS data becomes more involved and that zero-copy becomes impossible. I don't think that additional overhead would be a big deal for service invocations.
On the lower level of the stack, it adds some wire overhead in that it means you can't re-use the sequence number used by the reliable protocol for the request, and that you may decide to also add a unique client identifier to the header. For the reply, I don't think there is a material wire overhead.
About the identifier: you can basically chose to reuse the identifier of, e.g., the client's writer, or you can create a new unique id for that. Reusing means you already know the GUID from the low-level protocol machinery and it is accessible from the API, so it meets my layering constraint while avoiding a few bytes of overhead. Nonetheless, adding a new unique id is, in my view, a solution that more cleanly separates the concerns and avoids some order dependencies on creating the reader/writer pair for a client.
Since I have failed to do as I promised @gavanderhoorn ... perhaps we can just use this latest trigger to finally do something about it. Of course I'd be happiest if @MiguelCompany would be happy to rework the rmw_fastrtps_cpp
implementation to match Cyclone's, but that's probably a stretch and the way forward is to share the pain a bit.
I don't think this ticket is necessarily the place for that discussion. I did feel it unacceptable to not at least do something to make good on that promise of mine ...
it really is necessary because of some devices only supporting middleware 1 and the other only middleware 2, and yet they have to be combined into a single system.
@eboasson totally agree.
For example, by default we could print very loud warnings if someone is mixing stuff.
@alsora yeah, that was the idea. and i think documentation and warning would do some work, we user still can try with warning message if we want to.
So I guess it depends on what we mean by "support".
@clalancette i take support
is if issue happens that is the bug we must fix
? i think that it still makes sense to add doc like cross-distro and cross-vendor communication is not supported officially. some might work but it is anti-pattern and not recommended.
I do not have detailed implementation idea how to add the warning though...
@clalancette i take
support
isif issue happens that is the bug we must fix
? i think that it still makes sense to add doc likecross-distro and cross-vendor communication is not supported officially. some might work but it is anti-pattern and not recommended.
See, I think that is the wrong way to go. We are essentially taking current bugs (inter-service operability between CycloneDDS and our other RMWs), and upgrading them to policy. I think we should be going the opposite way; attempting to make these interoperable (with tests), so that we can officially claim that cross-vendor compatibility is supported. If we don't allow this, then we are making it difficult to produce a robot running ROS 2 that can be communicated from outside the robot.
With that in mind, my proposal for this would be the following:
We are essentially taking current bugs (inter-service operability between CycloneDDS and our other RMWs), and upgrading them to policy.
ah, that is true, there is already issue and discussion taking place.
I think we should be going the opposite way; attempting to make these interoperable (with tests), so that we can officially claim that cross-vendor compatibility is supported.
totally good to go for cross-vendor communication. and your proposal makes sense to me to support cross-vender communication.
what about cross-distribution communication? according to current API/ABI management, i do not think this can be supported?
@clalancette i take
support
isif issue happens that is the bug we must fix
? i think that it still makes sense to add doc likecross-distro and cross-vendor communication is not supported officially. some might work but it is anti-pattern and not recommended.
See, I think that is the wrong way to go. We are essentially taking current bugs (inter-service operability between CycloneDDS and our other RMWs), and upgrading them to policy. I think we should be going the opposite way; attempting to make these interoperable (with tests), so that we can officially claim that cross-vendor compatibility is supported. If we don't allow this, then we are making it difficult to produce a robot running ROS 2 that can be communicated from outside the robot.
With that in mind, my proposal for this would be the following:
- Re-enable tests in https://github.com/ros2/system_tests/tree/rolling/test_communication that test cross-vendor service calls between Fast-DDS and Connext.
- In the special case of CycloneDDS, add a warning in rmw_cyclonedds_cpp when connecting to it from a non-CycloneDDS peer. That warning would say something like "services don't currently interoperate, this may or may not work".
- Update our documentation to point out that the special case of CycloneDDS doesn't work.
I do need to point out here that we (the Eclipse Cyclone DDS project) do not consider it a bug. There never was any expectation of service interoperability in the past, there never was a concrete proposal on how to implement it in an interoperable manner (other than some handwaving in some call that I recall), and we are the only ones who do it without requiring support for yet another bad specification (RPC over DDS). When all the pros and cons are considered of the various options, you'll find that that bad specification adds exactly nothing to the end result.
What we do consider a bug is that service interoperability was ignored in ROS 2 from day one. A proper path to addressing that is something we're happy to be involved in, but simply expecting us to abide by the technical choices of others where we strongly believe those choices to be bad is not the way.
If you wish to pin it on Cyclone DDS and state that it doesn’t work, i.e., “is broken”, that is your good right. It is not an attitude that would motivate us to “fix” things.
If you wish to pin it on Cyclone DDS and state that it doesn’t work, i.e., “is broken”, that is your good right. It is not an attitude that would motivate us to “fix” things.
Apologies, my intention was not to call out Cyclone DDS. From my perspective, I didn't really know that DDS didn't have a standard for a long time, and that there was contention about the standard that exists.
I'm honestly not sure of a way forward (which I guess is why we've been in this situation for a long time). It seems like Fast-DDS and Connext both implement the standard that exists (as flawed as it may be). Cyclone implements something potentially better, but which is not standardized. Some ways forward I see are:
I take it that Erik is in favor of 1. @eboasson just for my information, is there any potential for 2? Also do you have any other ideas on a way forward here?
we are the only ones who do it without requiring support for yet another bad specification (RPC over DDS). When all the pros and cons are considered of the various options, you'll find that that bad specification adds exactly nothing to the end result.
@eboasson I am trying to catch up with this. this has been explained in https://github.com/ros2/rmw_cyclonedds/issues/184? (which is almost 3 years old...) specifically on this comment https://github.com/ros2/rmw_cyclonedds/issues/184#issuecomment-630679396?
This issue has been mentioned on ROS Discourse. There might be relevant details there:
https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-02-16/29927/1
https://github.com/ros2/ros2_documentation/pull/4736 is merged, i will close this issue.
There has been some questions and discussion related to cross-vendor and cross-distro communication support in ROS 2.
And the answer is
NO
for both cases. currently ROS 2 does not support any cross-distro or cross-vendor communication officially. (which is my understanding, if i am mistaken, let me know) some of these cases are implemented in test code, but that does not mean that we support these compatibility.No matter this is being current limitation or specification, it would be probably nice to describe clearly that is not supported in ROS 2 documentation officially? So that we can avoid potential unexpected problems and questions in the future.
Adding documentation would not be good enough to guarantee to avoid these cases, maybe we can add warning if the communication takes place with different distro or vendor implementation at runtime.
This topic was from today's MW WG meeting. (CC: @wjwwood @alsora @asorbini @ros2/middleware_working_group )
Related but out of scope from this issue: