torch-points3d / torch-points3d

Pytorch framework for doing deep learning on point clouds.
https://torch-points3d.readthedocs.io/en/latest/
Other
2.46k stars 391 forks source link

limited examples on jupyter notebooks #624

Closed Ademord closed 3 years ago

Ademord commented 3 years ago

Could we get an example how to load the Detection or the Registration networks and pass in a depth file or a mesh file, and run prediction?
I am struggling to see how to perform inference on a file.

humanpose1 commented 3 years ago

registration and detection works with point cloud, if you want to run a prediction on a depth file or a mesh, you need to convert it into a point cloud. please have a look at this function to convert a depth image into a point cloud. K means the intrinsic matrix. If you want to apply it for registration, you need to do:

pos = torch.from_numpy(extract_pcd(depth_image)[0])
data = Data(pos=pos, batch=torch.zeros(len(pos)).float())
data = transform(data)
model.set_input(data, device="cuda")
output = model.forward()

for mesh, you can extract the vertices(which is a point cloud) and apply the model

Ademord commented 3 years ago

thank you for your reply. i have the following questions:

# load a point cloud from depth image
# question: here i would then do numpy([extract_pcd(depth_image) for depth_image in depth_images_in_folder]) for example?
# question: why is this called pos and not pcd? pos makes me think of position or pose
pos = torch.from_numpy(extract_pcd(depth_image)[0])
# we load the data with the len of examples as batch size
data = Data(pos=pos, batch=torch.zeros(len(pos)).float())
# question: what does transform do?
data = transform(data)
# set model input and GPU if available
model.set_input(data, device="cuda")
# use the model for prediction
# question: why are we not calling model.predict(input)?
output = model.forward()
# question: what is the output format and how do i save the output as a mesh?
# question: is there any kind of evaluation metric for how "cohesive" or "complete" a point cloud is? > my use case is i have depth images from multiple shots and want to merge them together (i just found out about [tsdf-fusion](https://github.com/andyzeng/tsdf-fusion-python)) > how could i say: after N pictures taken from different perspectives, N+M pictures do not change the point cloud, since all possible points have been "scanned"? 
humanpose1 commented 3 years ago

1) Not exactly here depth_ image is a matrix of shape N x M, you first need to open it with libraries such as imageio, opencv for example. 2)It is called pos because we adopt the naming convention of torch_geometric a library for deep learning on graphs. Here pos means position. 3) Actually, my example is specific but it depends on models. If you want to apply model on the notebook 4) transformation is defined on the notebook. It contains preprocessing transforms such as GridSampling3D to compute coordinates for sparse convolution(in notebook example, we use minkowskiEngine for the convolution). It can be other transforms. 5) Yes, the GPU needs to be available but it works on cpu too. 6) set_input is important and depends on the data, it is where data is transferred from cpu to gpu, also, in function of the model (pointnet, kpconv, rsconv...), other preprocessing that cannot be applied beforehand are applied. 7) for registration, output is feature of size N x D N is the size of the point cloud and D is the dimension of the feature. You need to match features and apply a robust algorithm such as RANSAC or tease or fast global registration. You can use open3dlibrary for ransac and fast global registration 8)I've also used tsdf-fusion to have training data on 3DMatch. the code is here Personally, I've just checked qualitatively if the merge was good. But I think you can have a look at this paper for a more rigorous method for 3D reconstruction http://redwood-data.org/indoor/data/choi2015cvpr.pdf to apply tsdf-fusion, you will need the pose.

Ademord commented 3 years ago

Hi @humanpose1, thanks for your feedback

Just to recap, I am trying to find a way to motivate an RL agent (Unity) to discover a point cloud (or scan it). So I need a way to store the PC, accumulate it / aggregate it when new scans arrive, and then reward the agent the more "new points" that arrive. < this is the part where I need to figure out how to store a point cloud and augment it with new scans.

So I am still super confused with all these methods that exist. I found tsdf-fusion but I am yet to try it in my "real-time setup". On my current list left to try is:

I tried openSFM and openMVG but they didn't show results and they are offline methods anyway as far as I understood...

Unfortunately I only have such limited time, I need to settle for a library/method and stick to it :(

I tried running the demo notebook you linked and it gave me an installation error I made a new issue for.

humanpose1 commented 3 years ago

I'm not sure to clearly understand but one of your problem looks like a SLAM problem(simultaneous localization and mapping) with depth image. I'm not an expert of SLAM but I think you should have a look at kinectfusion.. It's realt-time and it will allow you to store a 3D model of the scene with a TSDF and augment it with new scans. tsdf-fusion is for offline 3D reconstruction with RGBD frame, the pose, the intrinsic matrix.

I'll solve the error in the notebook.

Hope it can help !

Ademord commented 3 years ago

@humanpose1 thanks for your feedback! i got the notebook to run, torch-geometric just pushed one hour ago 1.7.1 which fixed an issue they had on their side, and i also learned that I need to set an ENV or ARG in my dockerfile. For posterity I leave this section of my dockerfile here:

FROM pytorch/pytorch:1.9.0-cuda10.2-cudnn7-devel _<< as of 17.06.2021 this worked (my previous pip install mlagents downgraded my torch to'1.8.1+cu102')_
# torch-points3d
ARG TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1 7.0+PTX" 
RUN pip install torch-points-kernels==0.6.6 torch-points3d 
RUN pip install pyvista panel

# MINKOWSKI ENGINE
RUN sudo apt install -y libopenblas-dev
RUN pip install ninja
RUN pip install -U MinkowskiEngine --install-option="--blas=openblas" -v --no-deps

thanks for pointing me in the right direction! yes I also think this goes into the direction of SLAM.

just also to have it clear, why would torch-points3d not be useful for my problem?

i will look into kinectfusion today 👍🏻

ttsesm commented 3 years ago

@Ademord unfortuntely this is the reality (advantage and disadvantage at the same time) of open source. Meaning that you have the options to choose what fits better to your needs but at the same time it becomes kind of cluttered and confusing (or time consuming to test all the possibilities since usually there is a lot of trial and error) to choose the proper one.

To be honest I also did not clearly understand your actual problem. If I understood correctly, please correct me if I got it wrong, you have a bunch of depth images from where you want to retrieve the point clouds and then fuse all together (register them) into one??? If this is the case then I also do not believe that torch-points3d is the correct library to use, there are a bunch of papers (with code) out there for registration https://paperswithcode.com/task/point-cloud-registration while there are already some libraries that can do that offline e.g. open3d, vedo, trimesh, etc..

If you also have to predict part of or all the scene then maybe possibly you could have a look on the pytorch3d fit to mesh tutorial https://pytorch3d.org/tutorials/deform_source_mesh_to_target_mesh or the atlasnet possibly (tbh with you, I am not that familiar with this topic though).

Ademord commented 3 years ago

@ttsesm thanks for your reply, similarly to what this person is trying to do here (just found out about it), i am trying to:

I have a question about continuous evironment mesh building - could you point me a way how to implement creating enviroment "scanner", i want to create approximate model of around environment - i see this like every new frame will fill and deform already existing "scanned" mesh and make it more accurate (in future - with environment texture mapping). I think the simpliest way is to subtract new scanned mesh from already scanned.

so yes as @humanpose1 described: I am trying to do something online in the direction of KinectFusion, and not offline. But my question was more to have it written what can torch-points3d do and what can it not in this use case?

thanks again to all of you for taking the time to understand the scenario and actually, let me generate a gif for you to see what i mean, since images show more than words in some cases: unity_scene

Context:

ttsesm commented 3 years ago

Then what you need I think is more related to the following papers/projects:

DI-Fusion: Online Implicit 3D Reconstruction with Deep Priors (CVPR2021): https://github.com/huangjh-pub/di-fusion

RetrievalFuse: Neural 3D Scene Reconstruction with a Database (CVPR2021): https://www.youtube.com/watch?v=HbsUU0YODqE / https://nihalsid.github.io/retrieval-fuse/ (I guess code will be available soon when CVPR 2021 is online)

ATLAS: End-to-End 3D Scene Reconstruction from Posed Images (ECCV 2020): https://www.youtube.com/watch?v=9NOPcOGV6nU (code and paper links in the descriptions of the video)

SemanticPaint: A Framework for the Interactive Segmentation of 3D Scenes (ISMAR 2018): https://github.com/torrvision/spaint (https://www.youtube.com/watch?v=PAmcJ5_cruY)

BundleFusion: Real-time Globally Consistent 3D Reconstruction using On-the-fly Surface Re-integration (ASM 2017): https://github.com/niessner/BundleFusion / https://www.youtube.com/watch?v=keIirXrRb1k / https://github.com/FangGet/BundleFusion_Ubuntu_Pangolin / https://github.com/nonlinear1/BundleFusion_Ubuntu_V0

and there are quite a few others if you search around. Now which will work best for you I think it is a trial and error procedure unfortunately. I would start with the most recent ones though.

humanpose1 commented 3 years ago

For panoptic segmentation on 3D Point Cloud, tp3d is the right library (you can ask @nicolas-chaulet for more informations). As for registration, tp3d is useful for the following case: you want to perform global registration on two point cloud (find the rotation and translation between two point clouds). For SLAM or 3D reconstruction, it will not be enough (see the reference of @ttsesm or kinectfusion ). But I believe that it can still be useful in a SLAM pipeline (for loop closure for example).

Ademord commented 3 years ago

Thank you both for your replies! This weekend I will find a way to export depth info from unity cameras and then I'll implement a simple prototype with ICP: http://www.open3d.org/docs/0.7.0/tutorial/Basic/icp_registration.html, and then with kinectfusion (my next thesis' sync-meeting is on monday so I will keep you posted).

nicolas-chaulet commented 3 years ago

Clsoing for now. Good luck!