haofanwang / ControlNet-for-Diffusers

Transfer the ControlNet with any basemodel in diffusers🔥
MIT License
801 stars 47 forks source link

[Discussion] Diffuser Framework vs WebUI Extension #10

Open paulo-coronado opened 1 year ago

paulo-coronado commented 1 year ago

Hello @haofanwang,

I am trying to replicate Mikubill Transfer Control. So, initially I followed this guide and after comparing the AnyV3_Canny_Model.pth created to the WebUI (using same prompt, seed etc.), I realized that they are not the same... is it normal? Please, check the differences below: Blank 2 Grids Collage (1) In addition, I am trying to save the merged files when using Mikubill repo, tried to add the following line in cldm.py in the end of the PlugableControlModel init(): torch.save(final_state_dict, './control_any3_openpose.pth')

But i don't think it is correct... do you have any thoughts on this?

haofanwang commented 1 year ago

(1) How does AnyV3_Canny_Model.pth created in WebUI? Isn't it using the method described in controlnet?

(2) What do you mean by not the same. The weights have different keys or values, or both?

(3) Can you provide more details about what you want to? What is your base-model and ControNet (what condition)? If possible, could you attach the input and control image (pose, depth or else), so that I can have a try.

@plcpinho

paulo-coronado commented 1 year ago

(1) I didn't create AnyV3_Canny_Model.pth via the WebUI. This is one of my goals, but I don't think it is possible because as I've understood Mikubill is not merging the models like your doing, he's doing it "on the fly" - didn't understand exactly how. The comparison above is AnyV3_Canny_Model.pth created via this guide vs Mikubill WebUI image generation;

(2) As both methods (tool_transfer_control.py and Mikubill WebUI) try to achieve the same result (CustomSD15+ControlNet), I thought about comparing both methods and checking if they generate same or similar images (using the same prompt, seed etc.). And the result was the above (different images).

(3) To be clearer, my goal is to create the best CustomSD15+ControlNet model (e.g. AnyV3_Canny_Model.pth) possible. And I found that Mikubill WebUI is generating better images if compared to the ControlNet base model replacement via tool_transfer_control.py. Does it make sense or is this a completely nonsense statement? 🤔

PS.: - CustomSD15 models I am referring to any SD15 model, such as AnythingV3, RealisticVision etc. - ControlNet models I am referring to SD15Canny, SD15OpenPose etc.

@haofanwang Thank you much for replying the thread!

paulo-coronado commented 1 year ago

I am trying to analyze Mikubill code in order to create a AnyV3_Canny_Model.pth file and then compare the weights of both methods (Mikubill vs tool_transfer_control.py), but I am not succeeding, because as I said I suspect there is no model merge. 😓 What do you think @haofanwang?

haofanwang commented 1 year ago

I totally agree with you. I also fail to figure out how does webui work on the fly. That is why I make this project.

One possible reason is that tool_transfer_control.py may miss out something, and is not fully converted. I'm not sure whether the released ControlNet weights are completely independent of UNet, or as mentioned in ControlNet repo, it may also train some layers into UNet. In such a case, we only load part of weights.

Anyway, thanks for your finding, I'm also interested in converting on the fly, we can keep updated. @plcpinho

paulo-coronado commented 1 year ago

@haofanwang I just found something very interesting! This guide was written by Illyasviel 2 weeks ago, last week this thread was created in which Illyasviel, Mikubill and Kohya-ss discussed about Transfer Control implementations. In one part of the conversation, Kohya-ss made this pull request for a method (which is btw the currently method used in Mikubill's repo) of Transfer Control. In Kohya-ss' words: _"extract_controlnet_diff.py makes the difference and save the state_dict with key difference as a marker, and cldm.py handles it on the fly."_.

I believe the answer to our questions is in this thread and in this PR mentioned above!

ghpkishore commented 1 year ago

@plcpinho can we for testing, use the Mikubill repo and get the canny model and then use that ? Should that work? I fully didn't understand how to solve this, except that I too ran into an error when trying to create canny edge based inpainting output. I would appreciate if you have can give specific steps through which we can use the new control models in our inpainting pipeline.

haofanwang commented 1 year ago

@plcpinho I'm glad to know this! I can work on it based on your info. If you are interested in, PR is very very welcome.

haofanwang commented 1 year ago

Will dive into https://github.com/Mikubill/sd-webui-controlnet/pull/80 and https://github.com/Mikubill/sd-webui-controlnet/issues/73. If anyone is willing to help, please let me know.

haofanwang commented 1 year ago

I don't find anything difference between merging beforehand and merging on the fly. They actually do the same. Still not clear what leads to such a difference, will spend more time on investing this.

For now, I have no plan to add merging on the fly in this repo, as we more care about using in diffusers. Our goal is to load the new model via diffusers function from_pretrained().

paulo-coronado commented 1 year ago

Thank you for your reply, @haofanwang!

Technically merging beforehand and on the fly do the same. However, I am not sure if the models generate the same/similar images. I am going to do some more tests today! Btw, you said in this thread that the model is actually being merged in both methods, do you know how to save the merged model in sd-webui-controlnet? Because, if we have both models saved, it is easy to compare the merging beforehand vs on the fly. I tried to add the following line in clmd.py:

# Does not work... saves a ~700KB file
torch.save(state_dict, './merged_model.pth')

About the cldm.py, I am also not sure if this code is actually running, because if look the "if" statement (line 66) there is never a "k" item starting with "control_model.". So, the operations you mentioned above, during my tests, never gets called. You can see that by adding some print() and running ControlNet via the WebUI... 🤔

haofanwang commented 1 year ago

It's kind of weird, let's follow up.

ghpkishore commented 1 year ago

@paulo-coronado any updates on how the canny model is differing? Also, I would appreciate if you can share more on if it is only the canny model which is not the same as Mikubil webui or if others are also differing.

paulo-coronado commented 1 year ago

Hey, @ghpkishore! I don't know why it is differing, but I can tell you that merging using this guide works! Although it generates different results, the images are still great! I might have done something wrong to have generated the above example...

ghpkishore commented 1 year ago

Thanks @paulo-coronado . Do you know how I can use safe tensors and convert that into normal PyTorch bin and use that. I am running into some error when I try to use the control net canny edge model already made which say that the weights are not initialised.

Tried for couple of hours and gave up

paulo-coronado commented 1 year ago

Do you have Discord, @ghpkishore? Send me your profile so me can chat there :)

ghpkishore commented 1 year ago

@paulo-coronado It is the same. @ghpkishore

paulo-coronado commented 1 year ago

@ghpkishore The Discord username is something like @ghpkishore#0000

ghpkishore commented 1 year ago

@paulo-coronado ghpkishore#4438