zju3dv / OnePose

Code for "OnePose: One-Shot Object Pose Estimation without CAD Models", CVPR 2022
Apache License 2.0
932 stars 79 forks source link

Questions on Online-tracking and Custom dataset preparation #19

Closed bibekyess closed 2 years ago

bibekyess commented 2 years ago

Hello, Thanks for the awesome work. I have couple of questions: (1) I want to train and test this on the custom dataset. But, I couldn't find sufficient information on how to make a custom dataset by self. Can anyone who has done it before, help me by giving some hints? (2) I ran the code on the 'sample_data', and I found that we first do pose estimation and then save the results on one folder and then run another script to do the visualization. Can't I do pose estimation and visualization concurrently? I saw that
Demo pipeline for running OnePose with custom-captured data including the online tracking module. will be updated soon. But, I am just wondering if the code for online tracking is already available on this repo? (3) When viewing the results, I saw that I get 3D bounding boxes but I couldn't see the orientation information on the results. From reading the paper, it says 6D-pose estimation which means 3 translation and 3 orientation, isn't it? Does the existing codes give information about orientation or not?

Thank you for your time and help!! 🙂

siatheindochinese commented 2 years ago

Personally, I collected my data synthetically (because manually annotating poses on real images are troublesome, even in AR using toolkits like ARCore) using Blender to get the object poses and intrinsics. You can also wait for the original authors to release their OnePose Cap app if you're patient enough.

For dataset collection:

For (2), it seems that you want to do real-time inference. You can try ripping out the pipeline from inference.py or inference_demo.py and put it through a cv2.VideoCapture while Loop.

For (3), the paper obtains the object pose using solvePnP on matched 2d-3d correspondences. Just try to find the line with eval_utils.ransac_PnP(...). I believe line 155 of inference.py gives you the object pose as pose_pred_homo.

Edit: I must also add that there is no further need to train the GATsSPG model, it is ready to be used with your point-cloud 3d model.

DeriZSY commented 2 years ago

(1) I want to train and test this on the custom dataset. But, I couldn't find sufficient information on how to make a custom dataset by self. Can anyone who has done it before, help me by giving some hints?

Hi, thanks for your interest in our work. For Q(1) and (2) pelase stay tuned for the codes on custom dataset and online tracking. For Q(3), please refer to the replay of @siatheindochinese

aditya1709 commented 2 years ago

@siatheindochinese Can you please elaborate on how you collected the data synthetically? Any particular library you used? And where did you find the 3D assets?

siatheindochinese commented 2 years ago

@aditya1709 I generated all my images, poses, re-projected bounding box coordinates in Blender. It allows python scripting, so a lot of work can be automated.

For 3D models, you can either 3D scan your models using 3D scanners (or other equivalent photogrammetry tools i.e NVIDIA MoMa) or manually model them yourself.

bibekyess commented 2 years ago

Hi @siatheindochinese Thank you for your detailed response! :) I have one more quick question. How is the accuracy or how is the model performing by training only with the synthetic dataset and testing with real pictures?

siatheindochinese commented 2 years ago

@bibekyess as long as the synthetic images are photorealistic enough the results should be sufficient.

I must reiterate that the synthetic images are only used to construct the SfM model, not used to train the GATsSPG.

You can check out my realtime result here, : https://www.linkedin.com/posts/sia-zhen-hao_opencv-computervision-machinelearning-activity-6965918457116733440-IAXE

For the video above, I did not implement optical flow tracking or object detection, thus the result is a bit shaky. Object-detection should help you crop out unnecessary SuperPoints, just use an off-the-shelf detector like YoloV5. The feature-matching object-detector provided in this repo is too slow for realtime inferencing.

bibekyess commented 2 years ago

@siatheindochinese Wow! Thanks for sharing the demo link and your explanation! I was wondering if you have tried onepose to detect 6D pose for multiple objects in the same frame? Like I want to show 3D bounding boxes for 5 objects in a single video. Is this framework reasonably feasible to do so?

aditya1709 commented 1 year ago

@aditya1709 I generated all my images, poses, re-projected bounding box coordinates in Blender. It allows python scripting, so a lot of work can be automated.

@siatheindochinese Have you by any chance put the blender scripts on your github where I can take a look? Might be a good starting point for me.

siatheindochinese commented 1 year ago

@aditya1709 you can check out BlenderProc2, it is tailored to generate synthetic data for computer vision tasks. I would not recommend manually writing out the entire rendering pipeline using vanilla Blender.

EvdoTheo commented 1 year ago

Hello @aditya1709, beside the folders you mentioned above, additionally, there is a file called "box3dcornres.txt" for every object. What kind of parameters includes?

AnukritiSinghh commented 1 year ago

@siatheindochinese Wow! Thanks for sharing the demo link and your explanation! I was wondering if you have tried onepose to detect 6D pose for multiple objects in the same frame? Like I want to show 3D bounding boxes for 5 objects in a single video. Is this framework reasonably feasible to do so?

Hi @bibekyess, did you get an answer for that? I want to run it for multiple objects in a single frame as well. Would be super helpful to know if you tried it. Thanks!