Question: About the unet and unet_phi

fkcptlst commented 4 months ago

In this implementation of VSD, the unet is the same as unet_phi. However, in threestudio's implementation, the unets are different.

According to the paper, shouldn't the unets be different? The phi model is initialized the same as unet, but after optimization, they should be quite different.

yuanzhi-zhu commented 4 months ago

The unet_phi can be a totally different unet (the SD2 model) or just a set of LoRA parameters (in our case) in practice. The unet_phi model is then frozen unet + trainable LoRA parameters in my implementation. You can refer to the original prolific dream paper for this, they also open-sourced recently :)

fkcptlst commented 4 months ago

Thanks for replying.

Yes, I get it that unet_phi can be a different model. In this implementation, extract_lora_diffusers is an in-place operation which alters the original unet too. I don't know if this is by design, but unet and unet_phi in this implementation are identical objects.

I've also checked the threestudio implementation, their unet and unet_phi are different instances with different attn_processor.

As a matter of fact, I'm also trying to implement 2D VSD from scratch. A curious observation is, when I used unet the same model as unet_phi (as in this implementation), it leads to poor results (200 steps).

media_images_visualization_rendered_199_bb3a0d29b659c3dbb461

When I instantiate the unet_phi as a separate model (i.e. different instances with the original unet, similar to threestudio implementation) with no other config modification, the results seemed normal.

media_images_visualization_rendered_199_a998f4c16d80eadce2b9

I noticed your implementation also yields normal samples with unet=unet_phi. But I'm not able to replicate your design. Did I miss anything important? What could be the reasons that VSD continues to work normally even if unet=unet_phi?

final_image_a_photograph_of_an_astronaut_riding_a_horse

Thanks!

yuanzhi-zhu commented 4 months ago

hi @fkcptlst, Did you put the whole unet_phi model into the optimizer? You should only optimize the LoRA parameters.

fkcptlst commented 4 months ago

hi @fkcptlst, Did you put the whole unet_phi model into the optimizer? You should only optimize the LoRA parameters.

Hi, I didn't put the whole model into the optimizer. I did almost exactly the same thing as you did.

fkcptlst commented 4 months ago

To my understanding, the 'a' model should be frozen, with no tunable lora parameters. The 'b' model should have tunable lora parameters, with the rest unet parameters frozen. They should be different instances.

In your implementation however, they are same instances with same tunable lora parameters.

yuanzhi-zhu commented 4 months ago

You are right. However, there is a hyperparameter 'scale', which controls the strength of the LoRA. With the same frozen part, when set 'scale' to 0, we are using the $a$ model, as mentioned here: https://github.com/yuanzhi-zhu/prolific_dreamer2d/blob/main/model_utils.py#L245

unet_cross_attention_kwargs in https://github.com/yuanzhi-zhu/prolific_dreamer2d/blob/main/model_utils.py#L249 and cross_attention_kwargs in https://github.com/yuanzhi-zhu/prolific_dreamer2d/blob/main/model_utils.py#L258 are different.

The reason to use only one UNet is to save GPU Mem...

fkcptlst commented 4 months ago

Thank you so much!!

Now I get it. I set the "scale" incorrectly because I didn't understand its purpose and overlooked it. After correctly setting it, I'm able to produce normal results.

Thank you so much for your help, closing issue.

yuanzhi-zhu / prolific_dreamer2d

Question: About the unet and unet_phi #15