liuyuan-pal / SyncDreamer

[ICLR 2024 Spotlight] SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
https://liuyuan-pal.github.io/SyncDreamer/
MIT License
906 stars 39 forks source link

Where is the image to be embeded? #55

Closed digbangbang closed 8 months ago

liuyuan-pal commented 11 months ago

Hi, I don't get the exact meaning of the problem. Can you provide a more clear explanation of your problem?

digbangbang commented 11 months ago

Hi, I have seen the SyncDreamer/ldm/models/diffusion /sync_dreamer.py code. 2 Questions left: Fisrt, In the training_step, the batch input maybe is like {target_image, input_image, elevation}, what is going to happen if I didn't have the target_image, since I see the https://github.com/liuyuan-pal/SyncDreamer/blob/bcad0f11b72c027cc4d0464918a7f6bb07ac26e5/ldm/models/diffusion/sync_dreamer.py#L380 allow the target_image is None. Because I am doing some jobs like this https://github.com/dreamgaussian/dreamgaussian/blob/59f46d372f2448274a70624797678b1620f7faab/guidance/sd_utils.py#L137 It is just image, then goes training and return the loss. The images input goes through the vae encoding process as the latent and use the unet as the noise pred. And the loss is made using the latent and the noise pred, I think I can use the syncdreamer for the same useage. Because I want to test the gaussian splatting with syncdreamer. Second, Does the model have the encoding process like vae like I mentioned above?

liuyuan-pal commented 11 months ago

Hi,

  1. The target image is None during inference but there must be some images during training.
  2. This is the encoding step: https://github.com/liuyuan-pal/SyncDreamer/blob/bcad0f11b72c027cc4d0464918a7f6bb07ac26e5/ldm/models/diffusion/sync_dreamer.py#L385C34-L385C34 Actually, we fixed the viewpoints so it may not be feasible to apply SDS to syncdreamer again.