Closed yankeesong closed 1 year ago
This is great! Do you have some visualization results? BTW, I just merged your last PR, so you might want to update your code if necessary.
Sure. I'm still tuning the parameters for a good result. Will post them and update the code soon!
Prompt: a girl with undercut blue hair
ControlNet with SDS:
ControlNet with VSD:
It's hard to say that VSD results are necessary of higher quality, but they indeed have higher diversity (as results from two runs are different).
I also added the functionality that if input control image is already a normal map (as is the case in texture training), it is directly passed to ControlNet instead of passed to NormalBae estimator (which yields a worse normal map than the input)
Very interesting. Please resolve the conflicts then we can merge this PR.
Someone pointed out that as in #189, ControlNet takes viewpoint normal whereas comp_normal yields world normal. Need to update the code.
Regarding the difference between vds and sds in the code, I only see whether lora is used or not. is there any other difference please?
Regarding the difference between vds and sds in the code, I only see whether lora is used or not. is there any other difference please?
Yes. If we only consider 1-particle case for VSD, then LoRA is basically the only difference. It would be good to write a general guidance that incorporates these in the future.
Finished. No obvious difference in generation results though. @bennyguo please take a look when you get time.
Hi, @yankeesong. I believe we need to conduct more experiments to verify the ControlNet+VSD
implementation.
During my experiments, I encounter the following issues:
ControlNet+VSD
setting appears to be too large. It is set to 7.5
in the original VSD implementation.shape_init_params
is not defined in fantasia3d-texture.yaml
, leading to a configuration parsing error.ControlNet+VSD
does not utilize the camera embeddings, which differs from the original VSD implementation.ControlNet+VSD
are very similar to ControlNet+SDS
.To address these, I suggest checking the implementation of VSD first. One straightforward way is to run ControlNet+VSD
with condition_scale=0
. In this setting, the results of ControlNet+VSD should be similar to the original VSD. You can make some modifications to prolificdreamer-texture.yaml
and obtain the results of the original VSD.
Hi, @yankeesong. I believe we need to conduct more experiments to verify the
ControlNet+VSD
implementation. During my experiments, I encounter the following issues:
- The guidance scale in
ControlNet+VSD
setting appears to be too large. It is set to7.5
in the original VSD implementation.- The
shape_init_params
is not defined infantasia3d-texture.yaml
, leading to a configuration parsing error.- Presently, the Lora training in
ControlNet+VSD
does not utilize the camera embeddings, which differs from the original VSD implementation.- Interestingly, the results of
ControlNet+VSD
are very similar toControlNet+SDS
.To address these, I suggest checking the implementation of VSD first. One straightforward way is to run
ControlNet+VSD
withcondition_scale=0
. In this setting, the results of ControlNet+VSD should be similar to the original VSD. You can make some modifications toprolificdreamer-texture.yaml
and obtain the results of the original VSD.
I will fix 2 and 3 soon and let you know.
As for 1 and 4, it's an interesting phenomenon. Since we already have the strong control from fixed mesh + controlNet, the guidance parameter (7.5) are not guaranteed to work off-the-shelf. What's more, for texture training, VSD were designed to solve the over-saturation problem from SDS, which is already (maybe partially) addressed by controlnet. So I think it's hard to imagine how VSD can further improve SDS results on controlnet. One possible route is to see the diversity though, as I mentioned above.
Hi, @yankeesong. I believe we need to conduct more experiments to verify the
ControlNet+VSD
implementation. During my experiments, I encounter the following issues:
- The guidance scale in
ControlNet+VSD
setting appears to be too large. It is set to7.5
in the original VSD implementation.- The
shape_init_params
is not defined infantasia3d-texture.yaml
, leading to a configuration parsing error.- Presently, the Lora training in
ControlNet+VSD
does not utilize the camera embeddings, which differs from the original VSD implementation.- Interestingly, the results of
ControlNet+VSD
are very similar toControlNet+SDS
.To address these, I suggest checking the implementation of VSD first. One straightforward way is to run
ControlNet+VSD
withcondition_scale=0
. In this setting, the results of ControlNet+VSD should be similar to the original VSD. You can make some modifications toprolificdreamer-texture.yaml
and obtain the results of the original VSD.I will fix 2 and 3 soon and let you know.
As for 1 and 4, it's an interesting phenomenon. Since we already have the strong control from fixed mesh + controlNet, the guidance parameter (7.5) are not guaranteed to work off-the-shelf. What's more, for texture training, VSD were designed to solve the over-saturation problem from SDS, which is already (maybe partially) addressed by controlnet. So I think it's hard to imagine how VSD can further improve SDS results on controlnet. One possible route is to see the diversity though, as I mentioned above.
Hi @yankeesong, thanks for your contribution! Could I ask a question: how does Controlnet solve the over-saturation problem? Do you have some results somewhere to showcase this?
Sorry for interrupting your discussion!
@yankeesong I agree with @DSaurus that we probably need small guidance scale to achieve good results with VSD :)
Hi, @yankeesong. I believe we need to conduct more experiments to verify the
ControlNet+VSD
implementation. During my experiments, I encounter the following issues:
- The guidance scale in
ControlNet+VSD
setting appears to be too large. It is set to7.5
in the original VSD implementation.- The
shape_init_params
is not defined infantasia3d-texture.yaml
, leading to a configuration parsing error.- Presently, the Lora training in
ControlNet+VSD
does not utilize the camera embeddings, which differs from the original VSD implementation.- Interestingly, the results of
ControlNet+VSD
are very similar toControlNet+SDS
.To address these, I suggest checking the implementation of VSD first. One straightforward way is to run
ControlNet+VSD
withcondition_scale=0
. In this setting, the results of ControlNet+VSD should be similar to the original VSD. You can make some modifications toprolificdreamer-texture.yaml
and obtain the results of the original VSD.I will fix 2 and 3 soon and let you know. As for 1 and 4, it's an interesting phenomenon. Since we already have the strong control from fixed mesh + controlNet, the guidance parameter (7.5) are not guaranteed to work off-the-shelf. What's more, for texture training, VSD were designed to solve the over-saturation problem from SDS, which is already (maybe partially) addressed by controlnet. So I think it's hard to imagine how VSD can further improve SDS results on controlnet. One possible route is to see the diversity though, as I mentioned above.
Hi @yankeesong, thanks for your contribution! Could I ask a question: how does Controlnet solve the over-saturation problem? Do you have some results somewhere to showcase this?
Sorry for interrupting your discussion!
Hi! I don't really have an answer for this. The reason of over-saturation is not even thoroughly explained in the literature I think (i.e. VSD addresses this problem, but they didn't say why). However as you can see from the videos above and here #240 there is no obvious saturation problem for controlnet guidance. My intuition (may not be correct) is tha fixed mesh+controlnet is encouraging very fine details, which somhow discourages over-saturation.
Closing this PR as now we have a controlnet_vsd branch on the main repo.
This is great! Do you have some visualization results? BTW, I just merged your last PR, so you might want to update your code if necessary.