I try to use clip-related features equipped with this model, such as single-view reconstruction.
I see through the original paper, and it said that feature requires training latent diffusion models by images.
I'd like to know how I can realize this.
We render 2D images from the 3D ShapeNet shapes, extracted the images’ CLIP [105]
image embeddings, and trained LION’s latent diffusion models while conditioning on the shapes’
CLIP image embeddings.
I guess I need to change clip_forge_enable = 1 when training train_prior.
But I needed help understanding how to do it properly.
I was wondering if you could instruct how to do it.
Hi, @ZENGXH. I appreciate your excellent work!
I try to use clip-related features equipped with this model, such as single-view reconstruction. I see through the original paper, and it said that feature requires training latent diffusion models by images. I'd like to know how I can realize this.
I guess I need to change clip_forge_enable = 1 when training train_prior. But I needed help understanding how to do it properly. I was wondering if you could instruct how to do it.
thank you in advance !