lmb-freiburg / flownet2

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks
https://lmb.informatik.uni-freiburg.de/Publications/2017/IMKDB17/
Other
1k stars 318 forks source link

3D motion vectors #216

Closed huatson closed 4 years ago

huatson commented 4 years ago

Great source of information, congratulations !!!, i got a question regarding 3d motion vector calculation. In the paper A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation mention:

three additional data passes per frame and view. These provide 3D positions of all visible surface points, as well as their future and past 3D positions. The pixelwise difference between two such data passes for a given camera view results in an ”image” of 3D motion vectors

i would like to know how to get those 3d positions of all visible surface points? i want to generate my own dataset and i wonder if there some available method/directions or advices in order to achieve that.

thanks !!

nikolausmayer commented 4 years ago

Hi, good question and I have thought about it a little. I see 4 general possible paths to take to get to 3D vectors:

  1. Use a renderer that produces 3D vectors (well, obviously). I don't have a great overview (it's been a while since we did our dataset) but I heard RenderMan may support this better than others.
  2. Do as we did and hack support for 3D position outputs into a renderer of your choice. We used Blender Internal which is not a great choice otherwise, but it was possible to add custom render passes and data storage without blowing up the entire engine. This renderer's ability to shift the scene time index independently of its own render data storage was key; we needed to be able to re-identify each vertex over time.
  3. You can bake a "3D position" texture for every surface, then render different time frames with that texture. This concept is a ton of overhead because you basically need to re-bake every texture for every object in every time frame, and the texture resolution directly affects your data accuracy. On the upside, it is a universal concept that should be independent of the renderer.
  4. Use the builtin 2D speed vector and Z passes of a renderer, then postprocess to get 3D vectors. This is limited to surface points which are visible in both the source and target frames, and bears risks of "false positive" matches, but at least in Blender this works out of the box (given that the 2D pass output is accurate) and is fast (you don't need multiple renders).

(2.) is a lot of effort and probably doesn't work for every renderer. (3.) is inefficient and numerically not perfect. (4.) is limited to non-occlusion areas.

huatson commented 4 years ago

thank you very much for the detailed answer!, definetly will try those, at least the blender options (2,4) since is the tool that i have knowledge,

nikolausmayer commented 4 years ago

If you find other options or run into interesting Blender problems, I'm always interested in details! ;)