rerun-io / rerun

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
https://rerun.io/
Apache License 2.0
6.21k stars 287 forks source link

Support fisheye projection and distortion #2315

Open pablovela5620 opened 1 year ago

pablovela5620 commented 1 year ago

Is your feature request related to a problem? Please describe.

Currently only a pinhole perspective camera is supported, in order to project points from 3d views into 2d for views that have significant distortion such as fisheye lens one needs to do it manually for accurate projection Describe the solution you'd like

Addition of distortion parameters as described here (same as what OpenCV uses) Describe alternatives you've considered

Current alternative involve just doing the projection yourself

Additional context

image

Here you can see how the current implementation leads to inaccurate projections

Here is the relevant PR for allowing 3D -> 2D projection https://github.com/rerun-io/rerun/pull/2008

nikolausWest commented 1 year ago

Also realized we're missing an issue for adding support for regular Brown-Conrady lens distortion

pablovela5620 commented 1 year ago

Wanted to add that this would also probably require arctan projection (I haven't worked with fisheye cameras before, realized that arctan projection is an important component to making the projection correct)

nikolausWest commented 1 year ago

Here is a good tutorial on computer vision with fisheye models for the eventual implementer: https://plaut.github.io/fisheye_tutorial/#the-fisheye-projection

And this one from Tangram is even better: https://www.tangramvision.com/blog/camera-modeling-exploring-distortion-and-distortion-models-part-ii

dgriffiths3 commented 7 months ago

Any update on this? Would love to see this feature.

nikolausWest commented 6 months ago

Hey @dgriffiths3 - great to hear this feature would be useful for you! Do you think you could expand a bit on exactly which parts you'd hope to see in support for fisheye models?

Regarding update, we've been focused on all the work leading up to launching blueprint (slotted for release after 0.13). After that we'll shift back much more to adding more datatypes and models etc. Currently brown-conrady distortion models are probably slightly higher up in priority but it's not impossible we do both together.

dgriffiths3 commented 6 months ago

Hi @nikolausWest, that's great to hear it's on the horizon! For my current use case we plot 3D boxes in world space, and then visualise them from the egocentric camera view. We use a combination of both approx. pinhole and high distortion (fisheye) lenses. Similar to the ARKitScenes example you have, but with additional non-pinhole cameras.

It would be expected that a straight box edge would be quite curved when viewed on the fisheye image, but currently this is not the case.

I think it might even be more useful to initially just provide an API for users to easily add their own custom camera model class. I think no matter how many models you support, there will always be another. Maybe this is something already there and I've missed it?

changh95 commented 5 months ago

Would love to have fisheye projection as well!

Currently, I am using a mix of perspective cameras, fisheye cameras, and LiDARs for autonomous driving setup. With the current implementation of perspective pinhole archetypes, it's easy to project 3D LiDAR points on perspective camera images, but not quite so on fisheye camera images.

Considering there are many camera models out there, I second @dgriffiths3 's opinion on providing an API for the users to make custom camera model class (perhaps it will open up more opportunities for contributions?)

jlazarow commented 4 months ago

@nikolausWest Would it make sense to add some sort of archetype (or indicator?) for Spatial2DView's that caused an additional Fisheye distortion (i.e., a warping shader on the image itself) on top of the existing Pinhole rendering results? My understanding is that a general Fisheye renderer is not always want people actually expect (i.e., it will probably work well on high-resolution meshes/point clouds but be less than desirable on say, 3D boxes). For some other wgpu-based work (wgpu-py), I already did this and it is generally is pretty satisfactory.

Wumpf commented 4 months ago

My current thinking is that we do it as a distortion as you say. Meaning it's essentially a postprocessing pass. The big disadvantage with that is that we may need to render a higher resolution to begin with otherwise we're getting sampling artifacts. Also, it's unclear if we can do it for the camera visualization in 3D easily: I believe we'll probably limit it to be active when "taking the viewport of the camera" since this is much more straight forward. Doing it as a postprocessing step should make it invariant on whether we're dealing with high resolution meshes or less tessellated things like boxes, so I'm not sure that is what you meant? 🤔

As to where to specify it: depends all on the actual data modelling. It's a property of a camera so I'd expect it to live with the camera, but I'm less sure if that's a new archetype or if it should be crammed into the Pinhole one as an optional "extension" so to speak.

jlazarow commented 4 months ago

Right, I think we're pretty aligned (post-processing). I agree on the downside being having to render at a higher resolution (although perhaps this factor would be a function of the degree of distortion). From a (personal) priority aspect, I was more indicating that:

  1. Assuming the 3D points are coming from an unprojection of a fisheye sensor at some instant of time, perhaps it's less interesting to expect to freely navigate the point cloud through the view of a fisheye sensor (but maybe this is important for self-driving applications?).
  2. Rather, the ability to view the world structures (e.g., 3D boxes) rendered against a set of actual camera poses collected from a device (e.g. the ARKitScenes example) and overlayed on the image itself (which presumably is already captured with distortion) might narrow the computational burden heavily and would be independent of what's actually being rendered.