using NeuralRecon in a realtime camera

Ademord commented 3 years ago

Hello, is there a way to use NeuralRecon in my camera input (using Unity)? I would highly appreciate it if you could share a collab on how to use NeuralRecons API (how to pass images?).

I was working to work with Atlas but I just found out about you today and was hoping I could make my research work with NeuralRecon to show what your research is capable of!

JiamingSuen commented 3 years ago

Hi Franco, thanks for your interest in our work. Similar to this reply, NeuralRecon requires camera poses with metric scale as input. Since it's not possible to compute those poses with only a monocular camera, a colab notebook that allows users to upload only the images will not work.

Instead, we provide a tutorial for users to capture the video and the camera poses with an iOS device. Since there is no easy way to connect an iOS device to a PC and transfer the camera feed and poses in real-time, this has to be an offline process.

Ademord commented 3 years ago

Noooo.... my world falls apart when reading this reply. i am so excited for your research.

i could pass this info obtained from the unity engine, getting the transform.quaternion. If you could help me understand how to API into your model...

I'd really love to use NeuralRecon for a realtime scan with my reinforcement learning agent in my unity environment...

It is a simulated environment so I can get the camera info you mention..

JiamingSuen commented 3 years ago

Thanks again for your interest in our work. I'm not sure if NeuralRecon will generalize well to synthetic images. I think it would be a better idea to use the depth buffer from the rendering engine (maybe adding some noise) and do TSDF integration directly.

For the inference API of NeuralRecon, you may take a look at https://github.com/zju3dv/NeuralRecon/blob/master/demo.py and https://github.com/zju3dv/NeuralRecon/blob/master/tools/process_arkit_data.py, but again, I'm not sure if it will worth your effort to adapt NeuralRecon to reconstruct a synthetic scene.

Ademord commented 3 years ago

Thank you so much for your feedback!! I will look into the TDSF algorithm that you mention.

I found something about ICP I don't know if that could also work. I'm thinking i always have to substract the new scan from the old one to get what points are actually new and then feed a reward proportional to this to the agent.

Do you have any kind of experience with TDSF or some code tutorial you could point me to?

Again I am so happy I found your research and to be able to contribute on this branch of science.

JiamingSuen commented 3 years ago

You may refer to this tutorial in Open3D for TSDF fusion.

Ademord commented 3 years ago

@JiamingSuen thank you for your reply! This is perfect. Two questions though:

Does the tutorial you referred take into account that one shot of the camera might be from the left and the other one from the right to try to make an integration of the PC from the two different angles? (i am trying to understand the theory behind it)
And my second question: a friend told me something I don't fully understand:

real-time implementation: The crucial part for a real-time implementation with arbitrary scene size is the underlying scene representation. Ideally, you store the TSDF voxels in a hash map (voxel hashing https://niessnerlab.org/papers/2013/4hashing/niessner2013hashing.pdf). However, if you would like to backpropagate errors in your real-time setting you have to pay attention to differentiablity. It probably takes some engineering effort to make this work.

why would I have to backpropagate errors in any case?

RL for point cloud aggregation: I would start with a very basic pipeline (e.g. just aggregating individual point clouds) as you would like to have easy access to the point cloud, which is not that easy if you use TSDF Fusion, RoutedFusion or NeuralRecon. If you use one of first two methods, you would always need to extract the mesh using marching cubes, sample a point cloud from the mesh (depending on the mesh size very expensive) and compute the reward from it. If you just aggregate point clouds, you have direct access to your reward.

So here he proposes to just simplify the pipeline to point cloud integration and get a reward from that.. so do you think I should manipulate the TSDFs directly or something like ICP could also work?

JiamingSuen commented 3 years ago

I'm not an expert in RL and I'm not sure I have fully understood the context to make a comment. In general, BP through point clouds will be easier than TSDF volumes.

I'm closing this issue for now, since the content we are discussing is beyond the questions of NeuralRecon.

TruongKhang commented 2 years ago

@JiamingSuen , I have a question about what you guys discussed above. You mentioned above that

NeuralRecon requires camera poses with metric scale as input. Since it's not possible to compute those poses with only a monocular camera.

However, I read your paper and find that

Given a sequence of monocular images and camera pose trajectory provided by a SLAM system...

I am confused a bit about these two statements. To my best of knowledge, a SLAM system (or even SfM in COLMAP) can estimate the camera poses from a monocular camera. Why can't I run SLAM to produce the camera poses and then apply your pre-trained NeuralRecon to produce the real-time reconstruction?

JiamingSuen commented 2 years ago

I am confused a bit about these two statements. To my best of knowledge, a SLAM system (or even SfM in COLMAP) can estimate the camera poses from a monocular camera. Why can't I run SLAM to produce the camera poses and then apply your pre-trained NeuralRecon to produce the real-time reconstruction?

A SLAM/SfM with only monocular images cannot recover the metric scale of the camera poses. On the other hand, a Visual Inertial SLAM system (e.g. ARKit) can provide poses with metric scales since the additional IMU measurements are used.

zju3dv / NeuralRecon

using NeuralRecon in a realtime camera #20