Closed rfeinman closed 5 months ago
Hi, during the inference, if the camera is not provided, we will use the default parameters as in: https://github.com/wyysf-98/CraftsMan/blob/9be2729bc3564f1ecc165171b4205313f7ace7b1/craftsman/models/conditional_encoders/clip_encoder.py#L107-L113. The default camera is defined in https://github.com/wyysf-98/CraftsMan/blob/9be2729bc3564f1ecc165171b4205313f7ace7b1/craftsman/models/conditional_encoders/base.py#L42-L67
This is indeed some complicated, as we want to simplify the inference code. Hope this helps
Ah yes I see this now, that makes sense. Thanks for clarifying!
Thanks for the great paper and code!
In the paper it says that your shape diffusion model conditions on camera embeddings in addition to images. But in the code, it looks like you are only inputing the images (see snippet below). Am I missing something? Does your model use the cameras or no? Thanks for clarifying!
https://github.com/wyysf-98/CraftsMan/blob/9be2729bc3564f1ecc165171b4205313f7ace7b1/craftsman/systems/shape_diffusion.py#L330