Question about the Stable-zero123 model

Superbia-zyb commented 10 months ago

Hi guys!
Firstly, thanks for the Stability AI opening source their Stable-zero123 pre-trained model, and the threestudio team integrating it into this framework!

I notice that the camera condition's calc detail in the "stable_zero123_guidance.py" is much different from the one in the "zero123_guidance.py" when I use it. https://www.diffchecker.com/07mHBhYI/ Straightforwardly, the new one replace the camera_distances - self.cfg.cond_camera_distance with torch.deg2rad(90 - torch.full_like(elevation, self.cfg.cond_elevation_deg)) in the T_cond's calc method.

It seems abnegate the relatived distance's guidance. Is there any theory that supports the difference? And can the Stable-zero123 model still sample the result which have nonzero relative dist with the input image? Please allow me to get to the bottom of things. I need to understand the details so that I can accurately present them when using it in my work.

Sincerely.

voletiv commented 10 months ago

From our experience, we figured that the elevation of the conditioning image is a strong condition for novel view synthesis. Then, to simplify matters for ourselves, we abnegated the relative distance, so all the novel views would be generated at the same camera distance as the conditioning image. Hope this helps!

Haian-Jin commented 9 months ago

Hi @voletiv ,

Thanks for your helpful information!

If the relative distance was abnegated, could you tell me how the camera distance was determined for rendering the training data?

Were all training images consistently rendered at a fixed camera distance, or did the camera distances differ among various objects while remaining constant within different viewpoints of a single object?

Additionally, I'm curious about the default rendering camera distance.

Happy Holidays!

Best regards, Haian

threestudio-project / threestudio

Question about the Stable-zero123 model #360