remyxai / VQASynth

Compose multimodal datasets 🎹
https://twitter.com/smellslikeml/status/1756723056675094726
217 stars 13 forks source link

Create a Gradio App to showcase & test the VQASynth Pipeline #24

Closed harshbhatt7585 closed 1 week ago

harshbhatt7585 commented 4 weeks ago

The gradio should take an image and produce the outputs:

  1. 3D models of objects which are compared
  2. Caption of the scene between objects
salma-remyx commented 1 week ago

@harshbhatt7585 updated the gradio app to use the .run() methods of the classes instead of the apply_transform() methods for hf datasets since it reduced overhead Also, pr #32 helped improve the depth estimates and we're able to use the .pcd segmented point clouds without having to recreate them

Example screenshot of a test:

Screenshot 2024-11-16 at 8 34 12 PM