Open nikolausWest opened 1 year ago
Some of my thoughts about non-linear lens distortion in general
rectify = undistort, i.e. undoing the radial distortion
The most general way to implement this is to log or generate two sets of lookup maps, i.e. two HxWx2 tensors: given a 2D position, it returns either the distorted or undistorted position.
Implementing this will touch all of the following, and probably more:
We probably also want to support users logging things both to the distorted and rectified images. It is common for users to do their own rectifying and they might do object detection in either the distorted or undistorted image space. This suggests the radial distortion is a new type of transform component (alongside Transform3D
and Pinhole
). For instance:
rr.log_bbox("world", object)
rr.log_transform("world/camera", extrinsics)
rr.log_pinhole("world/camera/pinhole", …)
rr.log_radial_distortion("world/camera/pinhole/distorted", …)
# Uses often do their own rectification:
rectified_image = cv.rectify(raw_image, distortion)
# Ideally you only need to log _one_ of the following, and Rerun will infer the other based on the logged distortion:
rr.log_image("world/camera/pinhole", rectified_image)
rr.log_image("world/camera/pinhole/distorted", raw_image)
# Maybe you do your object detection in the raw/distorted image:
rr.log_bbox("world/camera/pinhole/distorted/object", detect_object(raw_image))
# Or you do it in the rectified image:
rr.log_bbox("world/camera/pinhole/object", detect_object(rectified_image))
In the viewer we need to aggregate a path through the entity hierarchy into a transform chain, something like:
enum TransformLink {
ThreeD(Affine3A),
Pinhole(Pinhole),
RadialDistortion(RadialDistortion),
}
struct TransformChain(Vec<TransformLink>);
impl TransformChain {
/// Collapse neighboring matrices by multiplying them with each other
#[must_use]
pub fn simplify(&self) -> Self;
}
We can then pattern-match on that in order to figure out how to render something. This is also where it makes sense to add a pure 2D transform.
We need to do something like this no matter what rendering method we go with (see below).
Rerun supports projecting 3D objects into a 2D view (3D-in-2D) and displaying 2D images in 3D space (3D-in-2D). This is currently implemented by a simple linear transform, performed by the GPU. If this transform is no longer linear, things will become a lot more complex.
The simplest fist-step solution is to not implement the non-linear transforms. We would show the rectified space in the 3D view, but objects logged to the distorted space will not show up in the 3D view. Similarly we would project 3D objects into the rectified 2D space, but not into the distorted 2D space. Objects logged in the rectified view would not show up in the distorted view, nor vice versa.
We can implement proper non-linear transforms in one of two ways:
We render to texture, and then apply distortion or undistortion in a post-process-step. We would apply the distortion both the RGB and picking buffers.
When the user has hundreds of cameras in a scene (examples/python/open_photogrammetry_format/main.py --no-frames
), doing render-to-texture for all those cameras will become very expensive, both in GPU time and in VRAM usage, unless we are very careful.
This approach will introduce blurring unless we super-resolution.
We can add a non-linear step in the vertex shaders for all our primitives. The transform would go from being a Mat4
to being a Mat4
+ distortion-lookup + another Mat4
.
Requires highly tessellated primitives to become correct. In other words, we would need to up-tesselate low-poly meshes etc, ideally progressively as we get closer to them.
We do all the projection in the vertex shader, so implementing a non-linear step here is quite simple
One relaxation I'd like to add if it helps is that I don't think we need to bend any of the primitives. I think it's enough if we only distort/undistort the data that is there (i.e. just the corners of a box).
As someone with some experience in various camera calibration stuff, including distortion and some interest in using rerun in my own software, I will be keenly following this issue. (Lack of support for lens distortion has so far been the blocker for me to start using rerun.) I am the author of cam-geom and opencv-ros-camera, rust crates for camera models including a pure-rust OpenCV/ROS compatible pinhole model with Brown-Conrady distortion. If desired, I'm happy to discuss potential changes to these crates to make them useful here. I would happy to help to the degree I can (which, due to time constraints is likely more limited than I'd like).
I agree the Brown-Conrady ("plumb bob") model should be the first distortion model as it covers many (most?) use-cases, and I also agree with the general plan above. One minor question is whether "rectify" is the right word as to me the usage I'm familiar with refers to stereo vision (e.g. wikipedia).
Thanks @astraw, we really appreciate that. We'll definitely keep that in mind when we start implementing. We have a couple other big infrastructure pieces in play right now before we're ready to start though but this is a very high priority issue. We know a lot of folks are blocked on lens distortion support.
To reduce the scope of this issue, I'd like to add the following restrictions (also updated the main issue description)
We can get most of the value from just distorting images (rgb, gray, segmentation, depth) and points.
Most real world 3D computer vision applications need to make use of some kind of lens distortion model. The most common for a pinhole camera is Brown-Conrady lens distortion.
See OpenCV for the most common parametrization here
And this tutorial from Tangram on distortion models: https://www.tangramvision.com/blog/camera-modeling-exploring-distortion-and-distortion-models-part-ii
Important scope restriction/relaxation: