[Draft] Controlnet vsd guidance

bennyguo commented 1 year ago

This is great! Do you have some visualization results? BTW, I just merged your last PR, so you might want to update your code if necessary.

yankeesong commented 1 year ago

This is great! Do you have some visualization results? BTW, I just merged your last PR, so you might want to update your code if necessary.

Sure. I'm still tuning the parameters for a good result. Will post them and update the code soon!

yankeesong commented 1 year ago

Prompt: a girl with undercut blue hair

ControlNet with SDS:

https://github.com/threestudio-project/threestudio/assets/36912641/24cef76f-fac4-4990-ac3b-dc594a172f25

ControlNet with VSD:

https://github.com/threestudio-project/threestudio/assets/36912641/52340ebe-7dd4-4648-a5b7-9b1b352ec9df

https://github.com/threestudio-project/threestudio/assets/36912641/df811506-bd90-4404-8853-bbf6ccaf6957

It's hard to say that VSD results are necessary of higher quality, but they indeed have higher diversity (as results from two runs are different).

yankeesong commented 1 year ago

I also added the functionality that if input control image is already a normal map (as is the case in texture training), it is directly passed to ControlNet instead of passed to NormalBae estimator (which yields a worse normal map than the input)

bennyguo commented 1 year ago

Very interesting. Please resolve the conflicts then we can merge this PR.

yankeesong commented 1 year ago

Someone pointed out that as in #189, ControlNet takes viewpoint normal whereas comp_normal yields world normal. Need to update the code.

zbllbz6 commented 1 year ago

Regarding the difference between vds and sds in the code, I only see whether lora is used or not. is there any other difference please?

yankeesong commented 1 year ago

Regarding the difference between vds and sds in the code, I only see whether lora is used or not. is there any other difference please?

Yes. If we only consider 1-particle case for VSD, then LoRA is basically the only difference. It would be good to write a general guidance that incorporates these in the future.

yankeesong commented 1 year ago

Finished. No obvious difference in generation results though. @bennyguo please take a look when you get time.

DSaurus commented 1 year ago

Hi, @yankeesong. I believe we need to conduct more experiments to verify the ControlNet+VSD implementation. During my experiments, I encounter the following issues:

The guidance scale in ControlNet+VSD setting appears to be too large. It is set to 7.5 in the original VSD implementation.
The shape_init_params is not defined in fantasia3d-texture.yaml, leading to a configuration parsing error.
Presently, the Lora training in ControlNet+VSD does not utilize the camera embeddings, which differs from the original VSD implementation.
Interestingly, the results of ControlNet+VSD are very similar to ControlNet+SDS.

To address these, I suggest checking the implementation of VSD first. One straightforward way is to run ControlNet+VSD with condition_scale=0. In this setting, the results of ControlNet+VSD should be similar to the original VSD. You can make some modifications to prolificdreamer-texture.yaml and obtain the results of the original VSD.

yankeesong commented 1 year ago

Hi, @yankeesong. I believe we need to conduct more experiments to verify the ControlNet+VSD implementation. During my experiments, I encounter the following issues:

The guidance scale in ControlNet+VSD setting appears to be too large. It is set to 7.5 in the original VSD implementation.

The shape_init_params is not defined in fantasia3d-texture.yaml, leading to a configuration parsing error.

Presently, the Lora training in ControlNet+VSD does not utilize the camera embeddings, which differs from the original VSD implementation.

Interestingly, the results of ControlNet+VSD are very similar to ControlNet+SDS.

To address these, I suggest checking the implementation of VSD first. One straightforward way is to run ControlNet+VSD with condition_scale=0. In this setting, the results of ControlNet+VSD should be similar to the original VSD. You can make some modifications to prolificdreamer-texture.yaml and obtain the results of the original VSD.

I will fix 2 and 3 soon and let you know.

As for 1 and 4, it's an interesting phenomenon. Since we already have the strong control from fixed mesh + controlNet, the guidance parameter (7.5) are not guaranteed to work off-the-shelf. What's more, for texture training, VSD were designed to solve the over-saturation problem from SDS, which is already (maybe partially) addressed by controlnet. So I think it's hard to imagine how VSD can further improve SDS results on controlnet. One possible route is to see the diversity though, as I mentioned above.

dunbar12138 commented 1 year ago

Hi, @yankeesong. I believe we need to conduct more experiments to verify the ControlNet+VSD implementation. During my experiments, I encounter the following issues:

The guidance scale in ControlNet+VSD setting appears to be too large. It is set to 7.5 in the original VSD implementation.

The shape_init_params is not defined in fantasia3d-texture.yaml, leading to a configuration parsing error.

Presently, the Lora training in ControlNet+VSD does not utilize the camera embeddings, which differs from the original VSD implementation.

Interestingly, the results of ControlNet+VSD are very similar to ControlNet+SDS.

To address these, I suggest checking the implementation of VSD first. One straightforward way is to run ControlNet+VSD with condition_scale=0. In this setting, the results of ControlNet+VSD should be similar to the original VSD. You can make some modifications to prolificdreamer-texture.yaml and obtain the results of the original VSD.

I will fix 2 and 3 soon and let you know.

As for 1 and 4, it's an interesting phenomenon. Since we already have the strong control from fixed mesh + controlNet, the guidance parameter (7.5) are not guaranteed to work off-the-shelf. What's more, for texture training, VSD were designed to solve the over-saturation problem from SDS, which is already (maybe partially) addressed by controlnet. So I think it's hard to imagine how VSD can further improve SDS results on controlnet. One possible route is to see the diversity though, as I mentioned above.

Hi @yankeesong, thanks for your contribution! Could I ask a question: how does Controlnet solve the over-saturation problem? Do you have some results somewhere to showcase this?

Sorry for interrupting your discussion!

bennyguo commented 1 year ago

@yankeesong I agree with @DSaurus that we probably need small guidance scale to achieve good results with VSD :)

yankeesong commented 1 year ago

Hi, @yankeesong. I believe we need to conduct more experiments to verify the ControlNet+VSD implementation. During my experiments, I encounter the following issues:

The guidance scale in ControlNet+VSD setting appears to be too large. It is set to 7.5 in the original VSD implementation.

The shape_init_params is not defined in fantasia3d-texture.yaml, leading to a configuration parsing error.

Presently, the Lora training in ControlNet+VSD does not utilize the camera embeddings, which differs from the original VSD implementation.

Interestingly, the results of ControlNet+VSD are very similar to ControlNet+SDS.

To address these, I suggest checking the implementation of VSD first. One straightforward way is to run ControlNet+VSD with condition_scale=0. In this setting, the results of ControlNet+VSD should be similar to the original VSD. You can make some modifications to prolificdreamer-texture.yaml and obtain the results of the original VSD.

I will fix 2 and 3 soon and let you know. As for 1 and 4, it's an interesting phenomenon. Since we already have the strong control from fixed mesh + controlNet, the guidance parameter (7.5) are not guaranteed to work off-the-shelf. What's more, for texture training, VSD were designed to solve the over-saturation problem from SDS, which is already (maybe partially) addressed by controlnet. So I think it's hard to imagine how VSD can further improve SDS results on controlnet. One possible route is to see the diversity though, as I mentioned above.

Hi @yankeesong, thanks for your contribution! Could I ask a question: how does Controlnet solve the over-saturation problem? Do you have some results somewhere to showcase this?

Sorry for interrupting your discussion!

Hi! I don't really have an answer for this. The reason of over-saturation is not even thoroughly explained in the literature I think (i.e. VSD addresses this problem, but they didn't say why). However as you can see from the videos above and here #240 there is no obvious saturation problem for controlnet guidance. My intuition (may not be correct) is tha fixed mesh+controlnet is encouraging very fine details, which somhow discourages over-saturation.

yankeesong commented 1 year ago

Closing this PR as now we have a controlnet_vsd branch on the main repo.

threestudio-project / threestudio

[Draft] Controlnet vsd guidance #244