Closed Ademord closed 3 years ago
Hi Franco, thanks for your interest in our work. Similar to this reply, NeuralRecon requires camera poses with metric scale as input. Since it's not possible to compute those poses with only a monocular camera, a colab notebook that allows users to upload only the images will not work.
Instead, we provide a tutorial for users to capture the video and the camera poses with an iOS device. Since there is no easy way to connect an iOS device to a PC and transfer the camera feed and poses in real-time, this has to be an offline process.
Noooo.... my world falls apart when reading this reply. i am so excited for your research.
i could pass this info obtained from the unity engine, getting the transform.quaternion. If you could help me understand how to API into your model...
I'd really love to use NeuralRecon for a realtime scan with my reinforcement learning agent in my unity environment...
It is a simulated environment so I can get the camera info you mention..
Thanks again for your interest in our work. I'm not sure if NeuralRecon will generalize well to synthetic images. I think it would be a better idea to use the depth buffer from the rendering engine (maybe adding some noise) and do TSDF integration directly.
For the inference API of NeuralRecon, you may take a look at https://github.com/zju3dv/NeuralRecon/blob/master/demo.py and https://github.com/zju3dv/NeuralRecon/blob/master/tools/process_arkit_data.py, but again, I'm not sure if it will worth your effort to adapt NeuralRecon to reconstruct a synthetic scene.
Thank you so much for your feedback!! I will look into the TDSF algorithm that you mention.
I found something about ICP I don't know if that could also work. I'm thinking i always have to substract the new scan from the old one to get what points are actually new and then feed a reward proportional to this to the agent.
Do you have any kind of experience with TDSF or some code tutorial you could point me to?
Again I am so happy I found your research and to be able to contribute on this branch of science.
You may refer to this tutorial in Open3D for TSDF fusion.
@JiamingSuen thank you for your reply! This is perfect. Two questions though:
real-time implementation: The crucial part for a real-time implementation with arbitrary scene size is the underlying scene representation. Ideally, you store the TSDF voxels in a hash map (voxel hashing https://niessnerlab.org/papers/2013/4hashing/niessner2013hashing.pdf). However, if you would like to backpropagate errors in your real-time setting you have to pay attention to differentiablity. It probably takes some engineering effort to make this work.
why would I have to backpropagate errors in any case?
RL for point cloud aggregation: I would start with a very basic pipeline (e.g. just aggregating individual point clouds) as you would like to have easy access to the point cloud, which is not that easy if you use TSDF Fusion, RoutedFusion or NeuralRecon. If you use one of first two methods, you would always need to extract the mesh using marching cubes, sample a point cloud from the mesh (depending on the mesh size very expensive) and compute the reward from it. If you just aggregate point clouds, you have direct access to your reward.
So here he proposes to just simplify the pipeline to point cloud integration and get a reward from that.. so do you think I should manipulate the TSDFs directly or something like ICP could also work?
I'm not an expert in RL and I'm not sure I have fully understood the context to make a comment. In general, BP through point clouds will be easier than TSDF volumes.
I'm closing this issue for now, since the content we are discussing is beyond the questions of NeuralRecon.
@JiamingSuen , I have a question about what you guys discussed above. You mentioned above that
NeuralRecon requires camera poses with metric scale as input. Since it's not possible to compute those poses with only a monocular camera.
However, I read your paper and find that
Given a sequence of monocular images and camera pose trajectory provided by a SLAM system...
I am confused a bit about these two statements. To my best of knowledge, a SLAM system (or even SfM in COLMAP) can estimate the camera poses from a monocular camera. Why can't I run SLAM to produce the camera poses and then apply your pre-trained NeuralRecon to produce the real-time reconstruction?
I am confused a bit about these two statements. To my best of knowledge, a SLAM system (or even SfM in COLMAP) can estimate the camera poses from a monocular camera. Why can't I run SLAM to produce the camera poses and then apply your pre-trained NeuralRecon to produce the real-time reconstruction?
A SLAM/SfM with only monocular images cannot recover the metric scale of the camera poses. On the other hand, a Visual Inertial SLAM system (e.g. ARKit) can provide poses with metric scales since the additional IMU measurements are used.
Hello, is there a way to use NeuralRecon in my camera input (using Unity)? I would highly appreciate it if you could share a collab on how to use NeuralRecons API (how to pass images?).
I was working to work with Atlas but I just found out about you today and was hoping I could make my research work with NeuralRecon to show what your research is capable of!