Inference on Custom Data / Images / Video

zc-alexfan / arctic

[CVPR 2023] Official repository for downloading, processing, visualizing, and training models on the ARCTIC dataset.

https://arctic.is.tue.mpg.de

Other

287 stars 16 forks source link

Inference on Custom Data / Images / Video #7

Closed relh closed 1 year ago

relh commented 1 year ago

Can you provide a simple outline for how you would run inference using ArcticNet-SF/LSTM on custom data? I didn't see it in the DOCS but I might have missed it!

I'm assuming I might want to modify extract_predicts.py

zc-alexfan commented 1 year ago

Hi, we don't have custom data inference, but I think it can be done fairly easily by modifying extract_predicts.py. In particular, the inference is done with this line.

This is traced back to the forward function here. It takes inputs (which contains the input image), and meta_info (which contains the intrinsics for the weak perspective camera and the object name for each image).

So I think you can prepare these two objects then the code should run fine.

For dataset classes without groundtruth, I think this can be an example.

relh commented 1 year ago

Thanks so much! If I make a simple inference script I'll make a PR incase you guys want to use it :).

relh commented 1 year ago

Almost done! I basically just have a short script that loads a checkpoint, loads a mini version of the dataset, runs on a few images out of the dataset, and then generates all the .pt files that happen after running wrapper.inference.

There's a bunch of utils in common and I bet some of them are more useful than others.. if I want to turn my .pt files into pretty graphics like on comet.ml is there a best choice?

I'm currently using KEYS from submit_pose since that seems to be the best fit for what can be run as "inference", let me know if a different set of KEYS seem better.

Separately, do you think that the model would "just work" on new 3D object models? I was thinking of trying to get it running on DexYCB which also includes 3D models for objects!

I'd expect with only 11 objects during training that the "object decoder" (although using image-only CNN features) would just sort of learn object-class-specific articulated pose.

zc-alexfan commented 1 year ago

Thanks. I think a better to visualize is to use our viewer. See Visualization. There is an extraction mode vis_pose, which dumps the predictions to disk and the viewer can then visualize with pred_mesh flag to show the predictions in AITViewer. I think one can just extend vis_pose extraction to vis_pose_pred extraction and visualize with pred_mesh. The viewer supports offline rendering to render every frame in the prediction as an image. It also looks better than the graphics on Comet. The visualization on Comet is just to allow users to check if the training seems reasonable.

I think if you pre-train on ARCTIC, and test on DexYCB directly, it won't work. Because there is no canonical way to define the 6D pose for an object (e.g., consider a coffee mug, it has a rotation along the height axis). However, it would be interesting to see if pretraining on ARCTIC and finetuning on DexYCB will improve YCB object estimation. We experimented this on HO3D and it converges faster after pretraining.

zc-alexfan commented 1 year ago

Closing stale issue. Please reopen for further discussion.