Image with Camera Info composed message

One of the major pain points in running image data over non-ideal links in ROS, namely that the CameraInfo and Image messages get published separately, and due to the vastly different message sizes, they can fall victim to different retransmission/latency/packet loss (e.g. CameraInfo almost certainly fits in one packet or datagram, an Image almost certainly does not). The traditional practical solution(s) to this are usually:

assume CameraInfo never changes, and you can just use the most recent one
define a composite message type that includes both image and camera info (this PR)

(2) is the most robust answer, since it ensures that synchronization between image and camera info is maintained. It would be nice to standardize a composite message type that could then be directly supported in image_transport and in visualization tools (e.g. rviz, rqt).

I believe this is not breaking ABI or API, it could be backported to other versions too.

While I agree that this is not an ABI or API break, I do think it has the potential to (further) split the community.

That is, when writing a node now, you can pretty much be guaranteed that the type of your topic needs to be Image (or CompressedImage, though I think that is a historical artifact that we should probably consider merging into Image at some point). If we add this in, then when you write a node, it might have to take an Image/CameraInfo pair, or it might have to take an ImageWithCameraInfo message. Inevitably what will happen is that some nodes will take one, and some the other, and the community will be split.

Additionally, this is kind of the purpose of message_filters, so that you will only get a callback once you have at least one of each. This is a little more heavyweight, but it avoids the problem above and is a generic solution.

Because of those reasons, my inclination is not to take this. But I'm curious if @tfoote has thoughts or the rest of the @ros2/team .

This is tricky. Reasonable points can be made for each option.

However, I suggest to keep the messages separate:

as Chris said, all camera-image producers and consumers would need to support both methods for the foreseeable future due to how much code is already written and deployed
CameraInfo fields are generally static. I agree the few-hundred bytes is trivial compared to the pixel block in Image, but I still think we want to be aware of serialization/deserialization cost of this structure, since it's likely in the hot path for many applications.
I'm not aware of applications that dynamically set CameraInfo fields. Although I suppose it's theoretically possible (perhaps some high-end auto-focus or variable-zoom cameras can stream their current focal length?), I've never personally seen a ROS application that intentionally varies the camera calibration at run-time and streams it at a high rate. But maybe I've missed something; it's a big world out there.
as Chris said, user code tooling to "merge" static / very-slow topics with fast topics, like message_filters, seems like a nicely extensible solution to other similar scenarios. Anything we can do to make that easier and look nicer in user code (syntactic sugar, another library layer, etc.) feels worthwhile, as opposed to modifying N camera drivers and M image consumers to read the composite data structure. The same problem of "two topics that we want to consume simultaneously" arises, for example, with stereo camera images published separately, or with depth cameras that output separate message streams for depth and RGB images. Similarly, bimanual robots, bipeds/quadrupeds, multi-camera setups, etc., have similar problems.

Chiming in because @IanTheEngineer and I had the conversation with @ahcorde that got this started. To provide some history, we used an equivalent Image + CameraInfo message type in the DRC a decade ago, and I know we weren't the first or the only ones at the time to do so, let alone now.

That is, when writing a node now, you can pretty much be guaranteed that the type of your topic needs to be Image (or CompressedImage, though I think that is a historical artifact that we should probably consider merging into Image at some point). If we add this in, then when you write a node, it might have to take an Image/CameraInfo pair, or it might have to take an ImageWithCameraInfo message. Inevitably what will happen is that some nodes will take one, and some the other, and the community will be split.

The solution to this is for the tooling (ImageTransport, Rviz, and rqt image view, etc) to be updated as well. In places that already publish/subscribe to a Image/CameraInfo pair, a combined message requires minimal change. Particularly in places using ImageTransport, a combined message could be implemented completely transparently to the user (the message on the wire already being significantly different than what the user's callback receives).

CameraInfo fields are generally static. I agree the few-hundred bytes is trivial compared to the pixel block in Image, but I still think we want to be aware of serialization/deserialization cost of this structure, since it's likely in the hot path for many applications.

To take the case of an absurdly small image (100x100, 8-bit mono), the CameraInfo would be about 250 bytes, and the image would be about 10000 bytes. If serialization/deserializaton of an included info making up such a small fraction of a message's size is a meaningful cost, something is terribly wrong with RMW performance.

I'm not aware of applications that dynamically set CameraInfo fields. Although I suppose it's theoretically possible (perhaps some high-end auto-focus or variable-zoom cameras can stream their current focal length?), I've never personally seen a ROS application that intentionally varies the camera calibration at run-time and streams it at a high rate. But maybe I've missed something; it's a big world out there.

Zoom cameras, some autofocus cameras, and cameras that automatically calibrate are the most notable beneficiaries of a combined message. This is a chicken-and-egg issue - when fancier camera drivers want to support more complex behavior, they won't use the common messages, which means you won't see many public uses of more complex behavior, etc.

as Chris said, user code tooling to "merge" static / very-slow topics with fast topics, like message_filters, seems like a nicely extensible solution to other similar scenarios. Anything we can do to make that easier and look nicer in user code (syntactic sugar, another library layer, etc.) feels worthwhile, as opposed to modifying N camera drivers and M image consumers to read the composite data structure.

message_filters is a bit of a crutch. Yes, it can be used to achieve synchronization, but it's better to avoid the need for additional synchronization in the first place. If you consider the message types in geometry_msgs, for example, there are AccelWithCovariance, PoseWithCovariance, and TwistWithCovariance. All of those could have been implemented as a pair of <message>/Covariance, yet obviously the benefit of synchronizing them outweighs the minimal cost of doing so.

The same problem of "two topics that we want to consume simultaneously" arises, for example, with stereo camera images published separately, or with depth cameras that output separate message streams for depth and RGB images. Similarly, bimanual robots, bipeds/quadrupeds, multi-camera setups, etc., have similar problems.

As an aside, indeed this kind of structured synchronization is desired, and publishing multiple images and/or pointclouds in a single message (either via combining images/pointclouds together, or having a message array type) is not uncommon. Not all of these need to be handled in standard messages, but not handling them all doesn't mean we shouldn't handle some easy ones.

Thank you for the detailed and thoughtful comment! I really appreciate that :heart:

Perhaps like you said, it's a chicken-and-egg problem, and if we had better tooling, we'd see more use cases of vari-focal cameras and such. Maybe I have my head too much into traditional fixed-focus applications, where you calibrate the camera once and never touch the lens again.

The counter-examples of SomethingWithCovariance are also helpful. In those cases, I assume the intent was to capture things like pose estimators (EKF's and friends) where the covariance is all over the place and changes every cycle. But I agree that they are often used in scenarios where they are effectively hard-coded constants.

If the new composite message could be hidden behind ImageTransport, which "does the right thing" for each case (i.e. communicate with a peer publisher or subscriber with/without the new composite image+config type), then I would be onboard with this :+1: Do you have a feel for what fraction of "popular" camera drivers and image consumers currently use ImageTransport as opposed to directly using Image and CameraInfo messages? (I don't have a good feel for that.)

And yeah, my comment about the extra bytes and serialization time of adding the CameraInfo fields was silly to make without data proving or disproving it :smile:

The solution to this is for the tooling (ImageTransport, Rviz, and rqt image view, etc) to be updated as well. In places that already publish/subscribe to a Image/CameraInfo pair, a combined message requires minimal change. Particularly in places using ImageTransport, a combined message could be implemented completely transparently to the user (the message on the wire already being significantly different than what the user's callback receives).

We can certainly do things like that for the core, but it won't help the large ecosystem of packages that are out there. It just seems like we are going to be adding more burden on everyone out there to handle both, or risk splitting the community.

message_filters is a bit of a crutch.

I very much disagree with this. I think it is an elegant way to take building blocks of different pieces of information and combine them together in arbitrary ways. Otherwise you end up with many variants of xWithY messages, and things end up being hard to interoperate.

We can certainly do things like that for the core, but it won't help the large ecosystem of packages that are out there.

The core is the place these changes matter most. Projects can define and use (and have for years) their own Image + CameraInfo messages, but the one thing they can't do is use those messages with the core visualization tools without additional plugins or expensive republishers. Particularly if support is implemented transparently in image_transport, it would be entirely possible for existing code to benefit with little-to-no change.

I very much disagree with this. I think it is an elegant way to take building blocks of different pieces of information and combine them together in arbitrary ways. Otherwise you end up with many variants of xWithY messages, and things end up being hard to interoperate.

message_filters in general is a useful tool for performing these operations, since obviously the range of operations users want to perform will exceed what is possible/reasonable/etc to include in the core. However, for a specific tightly coupled pair of messages, having to manually layer additional synchronization is a crutch versus simply using a composite message type. The fastest and simplest synchronization is synchronization you don't have to perform in the first place because the message is in the right form to start with.

Taking a step back, given that some obviously-paired messages have essentially composite types already (e.g. AccelWithCovariance, PoseWithCovariance, TwistWithCovariance), in my view the question is not whether or not XWithY messages should exist ("yes" having been decided a long time ago), but whether or not Image + CameraInfo are sufficiently tightly paired to justify the same level of treatment. As fair as I am aware, Image + CameraInfo is the only pair so tightly coupled as to receive paired publisher + subscriber in the core distribution, which clearly speaks to how common this need is.

This is certainly not a black and white issue and there's no "right" answer but I strongly believe that we should not do this bundling. There's always a trade off of whether to bundle the data together or not. If you separate them there's coordination overhead. If you bundle them together there's potentially extra transportation costs.

There's often data that is correlated but is not used exclusively together. And in designing the messages one of the highest value aspects of doing this is to put together the semantically meaningful self contained units that can be reused throughout the system, including logging and playback.

The example above of a PoseStamped vs a PoseWIthCovarianceStamped are I agree not our best messages as they border on being too generic (but again it's a trade-off generic and reusable versus protection from being misused). But the distinction between them is quite significant in how you semantically process them. One is a position in space, and the other is a probability cloud of positions in in space. In general the covariance matrix doesn't stand well on it's own. But also these messages were created and standardized even earlier than message_filters and also could use a review/suggestion to split them up, but they have enough inertia now that changing them would be prohibitively expensive compared to the gain provided by refactoring them.

I believe this is not breaking ABI or API, it could be backported to other versions too.

Just adding the message definition won't break anything. But adding this message instead of modifying the existing one adds significant complication.

If we were to add this message it would become required that every single consumer of an Image also then in parallel subscribe to this other image. Otherwise the tool image_view would be unable to view this new image as the datatype is incompatible. So the image_view node would have to now subscribe to two different topics to be generically applicable and know how to subscribe to the aggregate datatype and then select the image sub-field. And then pass that along to the same callback as the other subscription.

And if you look at the other end, we should have a generic camera driver and it will now start publishing both the existing /image and /camera_info topics but also a new standard topic such as /image_with_info. But this then gets into a complicated challenge because the downstream image_view which is also written generically will open two connections one for each datatype and get double delivery. And if you have a couple stages in an image pipeline there's suddenly potentially a lot of extra data running around, or a lot of configuration to be done for the simple use case. It can't be auto configured because you don't know what nodes is on the other end of your anonymous publish subscribe and if it's coming out of a bag file or a mux node it might even have multiple publishers with different settings. If it's possible it would not be trivial to extend image_transport to do this automatically behind the scenes. And would this be better than image_transport just applying a message filter to do the aggregation? These message transport changes would also all need to be backported and made ABI/API compatible to the level of whatever modules they're integrated in as well.

message_filters is a bit of a crutch.

This is definitely not a crutch but a design decision from early on. ROS does not support subscriptions to subtopics. And as such we built the message_filters to enable synchronizing data for delivery. If you are to couple the data together you incur the cost of transporting all of it even if you only want part of it. In this case the data is very asymmetric and the overhead of packing the camera info in with the image is likely unnoticable. But if you were to say do this with a stereo camera you may want to subscribe to a single one of the images to do object detection and not incur double the bandwidth to get both images and then discard one. In this case it's basically mandatory to be able to subscribe to the synchronized pairs and this is what message_filters are designed to do, enable you to say I want this set of information together in a callback when it's all ready.

As such the message_filters are a core part of the architecture allowing us to decouple aspects of the system and greater granularity of access and only coupling the information transport if the subscriber requires it. The functionality of message_filters has worked well for a long time. If we have challenges on ROS 2 we need to make sure that they're working well and not just avoid them in the first cases that we encounter.

If we were to adapt to bundle the camera info into the image, that would imply that if you just want the camera info which is assumed to get through more reliably with much lower bandwidth then you would also have to subscribe to the bundled image which could bog down your smaller bandwidth link.

In the original proposal above it's stated that there's a large disparity in the potential delivery of the two messages, such that the CameraInfo will get through potentially more easily than the Image itself. In this case the message_filters will have a collection of those messages, and when and if an Image gets through the associated CameraInfo should already be present as it's smaller and likely more reliable. Is that not the case? If so why not?

So I'd like to take a more focused look at what the pain point/failure mode is here and make sure that we're not looking at an XY Problem before considering this proposal further.

ros2 / common_interfaces

Image with Camera Info composed message #227