zubair-irshad / CenterSnap

Pytorch code for ICRA'22 paper: "Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation"
https://zubair-irshad.github.io/projects/CenterSnap.html
286 stars 47 forks source link

Feasibility of Training by Yourself #25

Open DavidYaonanZhu opened 1 year ago

DavidYaonanZhu commented 1 year ago

Hi, Thanks for the excellent work. @zubair-irshad

We are currently working on robotics grasping and are particularly interested in your SOTA shape reconstruction.

  1. I have one Nvidia RTX 3090 card (24GB memory), would it be feasible to train your model on my PC? Since the data is 800GB, I guess it will take much training time.

  2. Alternatively, do you provide any pre-trained model that can be used instantly?

  3. Do you have documentation about how to use your model in real-time with an arbitrary depth camera connected to a PC?

Waiting for your reply. Thanks in advance.

DavidYaonanZhu commented 1 year ago

Hi, Can you briefly describe how to use the model with live-streamed data from an RGBD camera? @zubair-irshad

DavidYaonanZhu commented 1 year ago

Tested with custom image, need to fine-tune the network image

image

zubair-irshad commented 1 year ago

Hi @DavidYaonanZhu,

Please find my answers here:

  1. Yes, totally. Our model can be trained in around a day and a half on 13 GB GPU memory. If you have a larger GPU, you could also increase the batch size to make it train faster.

  2. Looks like you already tried our pretrained model. All the details are in our readme and our google colab. Please feel to also check that out.

  3. Please check my comment here for answers to both of your questions about a. finetuning the model and b. running real time on camera RGB-D stream of inputs. In short, running real-time from a camera is possible and this is what out work promises i.e. around 40 Frames per second inference. But we have not released the support for integrating our model with a camera hardware. Feel free to open up a PR for this, also look at the OAK-D/Realsense CenterSnap implementation I linked in my comment. I also have linked a way to get better results on other cameras without model finetuning in my comment here. You could always finetune if you have additional data which might be hard to get especially 3D data.

DavidYaonanZhu commented 1 year ago

Thanks for the great reply!

I will try it.