A model that reconstructs the same image. Is it a dumb idea?

lllyasviel / ControlNet

Let us control diffusion models!

Apache License 2.0

29.84k stars 2.69k forks source link

A model that reconstructs the same image. Is it a dumb idea? #131

Open AbyszOne opened 1 year ago

AbyszOne commented 1 year ago

First of all, I apologize if I'm asking nonsense. My doubt arises from the need to edit the same image, not a transformed one. Projects like img2img alternative test (automatic1111 script) have pointed to this for a while. So... Can't you just make a model based on rebuilding the same image for inference? Maybe there's a simpler way and I'm asking nonsense, but the last few days I've seen the possibility of using that to edit videos using an input in img2img as modifier and compositor. However it can be achieved, it would be an extremely powerful feature.

lllyasviel commented 1 year ago

instruct pix2pix?

AbyszOne commented 1 year ago

Owner

I understand that p2p is trained on editing examples, and effectively leaves the image intact without a prompt, although I'm not clear on the exact processes in this regard. Could it be used as a model in controlnet to achieve this type of effect?: https://www.reddit.com/r/StableDiffusion/comments/1175id9/when_i_say_mindblowing_i_mean_it_new_experiments/ There an image in img2img is used, which influences the light and color composition. Is something like that possible? It would be really interesting.

AbyszOne commented 1 year ago

instruct pix2pix?

I think I understand the problem. In the mentioned case, pix2pix is not the tool, because it doesn't rebuild the image, it just edits it. Perhaps pix2pixZero could fulfill the blend function we are looking for. Unfortunately it is not available yet.

tekakutli commented 1 year ago

@AbyszOne I too think that it would be valuable, it may be easier to finetune and add instructions to it, who knows. it's dataset is available

I tried to finetune instuct-pix2pix myself, but I failed and I tried to train a controlnet but my hdd size won't allow me

I made a quick summary of what I learnt and one would need to do if anyone wants to give it a shot https://github.com/tekakutli/controlnet_instruct

I just learnt that finetuning instruct-pix2pix requires at least 30gb of vram per gpu

AbyszOne commented 1 year ago

@AbyszOne I too think that it would be valuable, it may be easier to finetune and add instructions to it, who knows. it's dataset is available

I tried to finetune instuct-pix2pix myself, but I failed and I tried to train a controlnet but my hdd size won't allow me

I made a quick summary of what one would need to do if anyone wants to give it a shot https://github.com/tekakutli/controlnet_instruct

I just learnt that finetuning instruct-pix2pix requires at least 30gb of vram per gpu

🤕. Thanks for the effort. We are looking for an alternative with Mikubill. Hope can find a way soon.

geroldmeisinger commented 11 months ago

see here https://github.com/lllyasviel/ControlNet/discussions/561 ;)