Unconditioned noise of LoRA

Hi, I'm trying to learn your implementation of VSD loss and have a question. To get the noise with CFG, one should compute both conditioned and unconditioned noise. So why do you use encoder_hidden_states=torch.cat([image_camera_embeddings] * 2, dim=0), in the following two links? shouldn't it be something like encoder_hidden_states=torch.cat( [ image_camera_embeddings, torch.zeros_like(image_camera_embeddings), ], dim=0, ), ?

https://github.com/threestudio-project/threestudio/blob/8a51c37317b6f7cd74bb3cb24c975b56d0a96703/threestudio/models/guidance/stable_diffusion_vsd_guidance.py#L492C6-L492C6

https://github.com/threestudio-project/threestudio/blob/8a51c37317b6f7cd74bb3cb24c975b56d0a96703/threestudio/models/guidance/zero123_unified_guidance.py#L435C24-L435C24

Thank you very much!

threestudio-project / threestudio

Unconditioned noise of LoRA #298