remyxai / VQASynth

Compose multimodal datasets 🎹
https://twitter.com/smellslikeml/status/1756723056675094726
216 stars 13 forks source link

Adding more prompts #2

Closed smellslikeml closed 8 months ago

smellslikeml commented 8 months ago

Adding more of the prompts described in the Spatial VLM paper including the distinction between canonicalized point clouds.

Also improved depth estimation, switching to using GPU