nikolausWest commented 1 year ago

Most real world 3D computer vision applications need to make use of some kind of lens distortion model. The most common for a pinhole camera is Brown-Conrady lens distortion.

See OpenCV for the most common parametrization here

And this tutorial from Tangram on distortion models: https://www.tangramvision.com/blog/camera-modeling-exploring-distortion-and-distortion-models-part-ii

[ ] Users can log Brown-Conrady lens distorion models (at least radial + tangential)
[ ] When combined with a logged pinhole model, the lens distortion is automatically applied both between 3D and 2D views (spaces), as well as between 2D distorted and 2D undistorted views (spaces)
[ ] The presence and application of a lens distortion model is somehow indicated in the 3D view. Ideally these is an element in the 3D view that can be clicked such that the distortion model becomes selected

Important scope restriction/relaxation:

We do not need to visualize any bent lines or other shapes.
We can get most of the value from just distorting images (rgb, gray, segmentation, depth) and points.
- For all other primitives, we just won't support showing them in spaces that are separated by a non-linear transform (e.g. lens distortion) from the space they were logged to

emilk commented 1 year ago

Some of my thoughts about non-linear lens distortion in general

Nomenclature

rectify = undistort, i.e. undoing the radial distortion

The most general way to implement this is to log or generate two sets of lookup maps, i.e. two HxWx2 tensors: given a 2D position, it returns either the distorted or undistorted position.

Structure

Implementing this will touch all of the following, and probably more:

Ray projection/unprojection
Rendering of the camera frustum (we should render it with curved edges)
Depth cloud projection
2D-in-3D
3D-in-2D

We probably also want to support users logging things both to the distorted and rectified images. It is common for users to do their own rectifying and they might do object detection in either the distorted or undistorted image space. This suggests the radial distortion is a new type of transform component (alongside Transform3D and Pinhole). For instance:

rr.log_bbox("world", object)
rr.log_transform("world/camera", extrinsics)
rr.log_pinhole("world/camera/pinhole", …)
rr.log_radial_distortion("world/camera/pinhole/distorted", …)

# Uses often do their own rectification:
rectified_image = cv.rectify(raw_image, distortion)

# Ideally you only need to log _one_ of the following, and Rerun will infer the other based on the logged distortion:
rr.log_image("world/camera/pinhole", rectified_image)
rr.log_image("world/camera/pinhole/distorted", raw_image)

# Maybe you do your object detection in the raw/distorted image:
rr.log_bbox("world/camera/pinhole/distorted/object", detect_object(raw_image))

# Or you do it in the rectified image:
rr.log_bbox("world/camera/pinhole/object", detect_object(rectified_image))

Transform chain

In the viewer we need to aggregate a path through the entity hierarchy into a transform chain, something like:

enum TransformLink {
    ThreeD(Affine3A),
    Pinhole(Pinhole),
    RadialDistortion(RadialDistortion),
}

struct TransformChain(Vec<TransformLink>);

impl TransformChain {
    /// Collapse neighboring matrices by multiplying them with each other
    #[must_use]
    pub fn simplify(&self) -> Self;
}

We can then pattern-match on that in order to figure out how to render something. This is also where it makes sense to add a pure 2D transform.

We need to do something like this no matter what rendering method we go with (see below).

Non-linear transforms in the viewer

Rerun supports projecting 3D objects into a 2D view (3D-in-2D) and displaying 2D images in 3D space (3D-in-2D). This is currently implemented by a simple linear transform, performed by the GPU. If this transform is no longer linear, things will become a lot more complex.

The simplest fist-step solution is to not implement the non-linear transforms. We would show the rectified space in the 3D view, but objects logged to the distorted space will not show up in the 3D view. Similarly we would project 3D objects into the rectified 2D space, but not into the distorted 2D space. Objects logged in the rectified view would not show up in the distorted view, nor vice versa.

We can implement proper non-linear transforms in one of two ways:

Post-processing 2D distort/rectify

We render to texture, and then apply distortion or undistortion in a post-process-step. We would apply the distortion both the RGB and picking buffers.

When the user has hundreds of cameras in a scene (examples/python/open_photogrammetry_format/main.py --no-frames), doing render-to-texture for all those cameras will become very expensive, both in GPU time and in VRAM usage, unless we are very careful.

This approach will introduce blurring unless we super-resolution.

Vertex shader

We can add a non-linear step in the vertex shaders for all our primitives. The transform would go from being a Mat4 to being a Mat4 + distortion-lookup + another Mat4.

Requires highly tessellated primitives to become correct. In other words, we would need to up-tesselate low-poly meshes etc, ideally progressively as we get closer to them.

Depth cloud projection

We do all the projection in the vertex shader, so implementing a non-linear step here is quite simple

nikolausWest commented 1 year ago

One relaxation I'd like to add if it helps is that I don't think we need to bend any of the primitives. I think it's enough if we only distort/undistort the data that is there (i.e. just the corners of a box).

astraw commented 1 year ago

As someone with some experience in various camera calibration stuff, including distortion and some interest in using rerun in my own software, I will be keenly following this issue. (Lack of support for lens distortion has so far been the blocker for me to start using rerun.) I am the author of cam-geom and opencv-ros-camera, rust crates for camera models including a pure-rust OpenCV/ROS compatible pinhole model with Brown-Conrady distortion. If desired, I'm happy to discuss potential changes to these crates to make them useful here. I would happy to help to the degree I can (which, due to time constraints is likely more limited than I'd like).

I agree the Brown-Conrady ("plumb bob") model should be the first distortion model as it covers many (most?) use-cases, and I also agree with the general plan above. One minor question is whether "rectify" is the right word as to me the usage I'm familiar with refers to stereo vision (e.g. wikipedia).

nikolausWest commented 1 year ago

Thanks @astraw, we really appreciate that. We'll definitely keep that in mind when we start implementing. We have a couple other big infrastructure pieces in play right now before we're ready to start though but this is a very high priority issue. We know a lot of folks are blocked on lens distortion support.

nikolausWest commented 1 year ago

To reduce the scope of this issue, I'd like to add the following restrictions (also updated the main issue description)

We do not need to visualize any bent lines or other shapes.
We can get most of the value from just distorting images (rgb, gray, segmentation, depth) and points.
- For all other primitives, we just won't support showing them in spaces that are separated by a non-linear transform (e.g. lens distortion) from the space they were logged to.
Motivation
Applying non-linear transforms to geometric objects other than points and regular grids (images) is complex. Because of that, it's very uncommon to do so in user's algorithm code.
In the rare cases that it is done, users are likely to make explicit choices about potential approximations etc. That means in general it will be up to the user to do the modeling of the before and after themselves, in which case they can just log both pieces of data. In that case the real user need is really just being able to indicate that the distorted and undistorted variants of e.g. a box represents the same "thing".
While it would be cool, and potentially useful, to see how e.g. a box in one space gets bent in another due to lens distortion, I've never seen that in a visualization tool before so I doubt anyone will be expecting it. Most likely this is because the benefit doesn't usually motivate the cost.
- If it does show up as an important need for users once the basics are in place, we can consider adding support for visualizing bending shapes at that point.

rerun-io / rerun

Add support for Brown-Conrady Lens distortion #2499

Important scope restriction/relaxation:

Nomenclature

Structure

Transform chain

Non-linear transforms in the viewer

Post-processing 2D distort/rectify

Vertex shader

Depth cloud projection

Motivation