yccyenchicheng / SDFusion

MIT License
379 stars 32 forks source link

about text-to-shape synthesis #5

Closed peterjohnsonhuang closed 1 year ago

peterjohnsonhuang commented 1 year ago

Hi, Thank you for this great work! I have a question about the training process of text-to-shape. According to your paper, the model is trained to generate only the geometry of objects, so how do you handle the color/texture-related texts in the text-to-shape dataset? Did you just let the conditioned diffusion model learns to ignore such information and just capture the geometry-related information? Or did you apply other preprocess to eliminate color-related words in a given text?

yccyenchicheng commented 1 year ago

Hi,

Yes! Currently we ignore such information and do not apply some filtering for the text datasets.

peterjohnsonhuang commented 1 year ago

Got it! Thank you for the reply!

peterjohnsonhuang commented 1 year ago

Sorry, I have another related question. Did you try using the phrase in Text2Shape to colorize the text-guided generation from the same text using the texturization method described in the main paper? I am wondering if such a texturization method is able to correctly handle fined-grained descriptions in Text2Shape.

yccyenchicheng commented 1 year ago

I think we have tried some phrase. The results will mainly depend on the 2D diffusion model.

peterjohnsonhuang commented 1 year ago

I see, so that means the generated texture might not align with the data distribution ( e.g., in terms of FID or IS) of the texture of text2shape dataset but still somehow reasonable, right? Hope I didn't misunderstand your reply 😂

yccyenchicheng commented 1 year ago

yes the distribution of textures in text2shape might not align with stable diffusion

peterjohnsonhuang commented 1 year ago

Thanks!