nv-tlabs / GET3D

Other
4.16k stars 374 forks source link

Text-Guided 3D Synthesis #20

Closed AmanKishore closed 1 year ago

AmanKishore commented 1 year ago

Hi @SteveJunGao thank you for all your help! When you release the pretrained model (Issue #16) will you also be releasing the weights for the fine-tuned model on CLIP? Excited to continue working on this!

lalalune commented 1 year ago

Hi @AmanKishore We're doing this too, but we have opted for a 2D side-view generation using standard Stable Diffusion (with tags to ensure a white background) that then gets passed in here.

We are also experimenting with 3D Stable Dreamfusion (https://github.com/ashawkey/stable-dreamfusion) but it is much much slower and lower quality results. I believe that using SD to generate your image and then running it through GET3D will be 10-20x faster and higher quality result, provided you have sufficiently trained the model on similar object classes.

SteveJunGao commented 1 year ago

Hi, @AmanKishore ,

That's definitely one thing on my plan! Unfortunately, I may not have enough time to release the fine-tuned model on CLIP for text-guided 3D synthesis before CVPR, I expect this part will be released after CVPR when I get more free time to clean the code.

Hi @lalalune ,

It would be awesome to try this direction! Please let me know if you have any cool results on it!

AmanKishore commented 1 year ago

That's great! I'd love to chat @lalalune! Sounds like an interesting use case! DM'd you on Twitter! @SteveJunGao Quick question are you planning to release CLIP for text-guided 3D synthesis after June (post CVPR)?

SteveJunGao commented 1 year ago

Hi @AmanKishore,

I didn't have an exact timeline for this part right now, if you urgently need that part I recommend you go to this codebase, our implementation for text-guided 3D synthesis is based on this code.

AmanKishore commented 1 year ago

Thank you! And was the training data mostly Shapenet?

SteveJunGao commented 1 year ago

Yes, the training data is mostly shapenet!