lllyasviel / Fooocus

Focus on prompting and generating
GNU General Public License v3.0
38.52k stars 5.17k forks source link

For inpaint_v26.fooocus.patch #1967

Open peki12345 opened 6 months ago

peki12345 commented 6 months ago

Hi, I'm interested in learning about the training process for the "inpaint_v26.fooocus.patch" used in inpainting or outpainting task. Is there any training code or paper related to this?

andriyrizhiy commented 5 months ago

+1

zhangzhengyi12 commented 5 months ago

It's a LORA model for XL, but I don't know the exact training method either

zhangzhengyi12 commented 5 months ago

The inpaint effect of fooocus is the best fusion I've ever used, and I'm looking forward to more information from the author about the training!

andriyrizhiy commented 5 months ago

I am not sure that it is LORA. Usually LORA is much smaller than 1.32GB

peki12345 commented 5 months ago

I am not sure that it is LORA. Usually LORA is much smaller than 1.32GB

Actually, it's a LORA, it fits the definition of LORA, just adjusting the network weights without altering the network structure. Before merge, the inpaint result was green, but post-merge, the correct result can be achieved. It's quite impressive that this LORA can transform all the t2i_xl model into inpaint_xl model. I'm curious about how the LORA was developed.

zhangzhengyi12 commented 5 months ago

I am not sure that it is LORA. Usually LORA is much smaller than 1.32GB

Actually, it's a LORA, it fits the definition of LORA, just adjusting the network weights without altering the network structure. Before merge, the inpaint result was green, but post-merge, the correct result can be achieved. It's quite impressive that this LORA can transform all the t2i_xl model into inpaint_xl model. I'm curious about how the LORA was developed.

I'm guessing that the conversion of an arbitrary model into an inpaint model stems in part from the fooocus_inapint_head model, which is a small convolutional network used to compress 9 channels into 4 (since the standard model Unet only has 4 channels of inputs, whereas the repaint model has 9)

In terms of weights, the inpaint_model seems to be quite different from a normal LORA

zhangzhengyi12 commented 5 months ago

I am not sure that it is LORA. Usually LORA is much smaller than 1.32GB

Actually, it's a LORA, it fits the definition of LORA, just adjusting the network weights without altering the network structure. Before merge, the inpaint result was green, but post-merge, the correct result can be achieved. It's quite impressive that this LORA can transform all the t2i_xl model into inpaint_xl model. I'm curious about how the LORA was developed.

I feel that inpaint_model_head and inpaint_model should be a kind of tie-in, with the head being responsible for channel compression and the latter being responsible for augmenting the repainting ability of arbitrary models. Maybe the training method is based on a kind of copying, copying the original UNET, freezing the original UNET, letting only the copied part be trained, thus letting him maximize the learning of the repaints, and finally merging the weights during the inference period

Hetaneko commented 3 months ago

is there still no info about this ?