Closed fkcptlst closed 4 months ago
The unet_phi can be a totally different unet (the SD2 model) or just a set of LoRA parameters (in our case) in practice. The unet_phi model is then frozen unet + trainable LoRA parameters in my implementation. You can refer to the original prolific dream paper for this, they also open-sourced recently :)
Thanks for replying.
Yes, I get it that unet_phi can be a different model. In this implementation, extract_lora_diffusers
is an in-place operation which alters the original unet too. I don't know if this is by design, but unet and unet_phi in this implementation are identical objects.
I've also checked the threestudio implementation, their unet and unet_phi are different instances with different attn_processor
.
As a matter of fact, I'm also trying to implement 2D VSD from scratch. A curious observation is, when I used unet the same model as unet_phi (as in this implementation), it leads to poor results (200 steps).
When I instantiate the unet_phi as a separate model (i.e. different instances with the original unet, similar to threestudio implementation) with no other config modification, the results seemed normal.
I noticed your implementation also yields normal samples with unet=unet_phi
. But I'm not able to replicate your design. Did I miss anything important? What could be the reasons that VSD continues to work normally even if unet=unet_phi
?
Thanks!
hi @fkcptlst, Did you put the whole unet_phi model into the optimizer? You should only optimize the LoRA parameters.
hi @fkcptlst, Did you put the whole unet_phi model into the optimizer? You should only optimize the LoRA parameters.
Hi, I didn't put the whole model into the optimizer. I did almost exactly the same thing as you did.
To my understanding, the 'a' model should be frozen, with no tunable lora parameters. The 'b' model should have tunable lora parameters, with the rest unet parameters frozen. They should be different instances.
In your implementation however, they are same instances with same tunable lora parameters.
You are right. However, there is a hyperparameter 'scale', which controls the strength of the LoRA. With the same frozen part, when set 'scale' to 0, we are using the $a$ model, as mentioned here: https://github.com/yuanzhi-zhu/prolific_dreamer2d/blob/main/model_utils.py#L245
unet_cross_attention_kwargs in https://github.com/yuanzhi-zhu/prolific_dreamer2d/blob/main/model_utils.py#L249 and cross_attention_kwargs in https://github.com/yuanzhi-zhu/prolific_dreamer2d/blob/main/model_utils.py#L258 are different.
The reason to use only one UNet is to save GPU Mem...
Thank you so much!!
Now I get it. I set the "scale" incorrectly because I didn't understand its purpose and overlooked it. After correctly setting it, I'm able to produce normal results.
Thank you so much for your help, closing issue.
In this implementation of VSD, the unet is the same as unet_phi. However, in threestudio's implementation, the unets are different.
According to the paper, shouldn't the unets be different? The phi model is initialized the same as unet, but after optimization, they should be quite different.