Open DN6 opened 8 months ago
Hey @DN6, can I please work on this?
@ihkap11 hey! sure!
Hi @yiyixuxu, anyone working on this? Can I also contribute? Please let me know how may I proceed?
Hey @Bhavay-2001 I'm currently working on this. Will post the PR here soon. I can tag you on the PR if I there is something I need help with :)
ok great. Pls let me know. Thanks
@ihkap11 how's it going 😁 I'd loooooove to have this
Hey @landmann I'll post the PR this weekend and tag you if you want to contribute to it :) apologies for the delay, it's my first new model implementation PR
You a real champ 🙌 Happy Friday, my gal/dude!
Initial Update:
sd_xl_base_1.0_0.9vae.safetensors
as base pre-trained generative prior.Probably, a dummy code would look like this:
class SUPIRModel(nn.Module):
def __init__(self, sdxl_model_path):
super().__init__()
self.sdxl_pipeline = StableDiffusionXLPipeline.from_pretrained(sdxl_model_path)
self.glv_control = GLVControl(in_channels=3, out_channels=64, context_dim=128)
self.light_glv_unet = LightGLVUNet(in_channels=3, out_channels=3)
def forward(self, lq_image, context, num_inference_steps=50):
# Generate control signal using GLVControl
control_signal = self.glv_control(lq_image, context)
# Use SDXL pipeline for guided diffusion
restored_image = self.sdxl_pipeline(
prompt="",
image=lq_image,
control_image=control_signal,
num_inference_steps=num_inference_steps,
generator=None,
).images[0]
# Refine the restored image using LightGLVUNet
refined_image = self.light_glv_unet(restored_image, control_signal)
return refined_image
To cover later:
I'm currently in the process of breaking down SUPIR code into diffusers artefacts and figuring out optimization techniques to make it compatible with low-resource GPUs.
Feel free to correct me or start a discussion on this thread. Let me know if you wish to collaborate, I'm happy to set up discussions and work on it together :).
Looks fantastic! How far along did you get, @ihkap11 ?
Btw, a good reference for the input parameters are here https://replicate.com/cjwbw/supir?prediction=32glqstbvpjjppxmvcge5gsncu
@ihkap11 how you doing? Which part are you stuck?
Hey @landmann, I'm finding it hard to map a few components from the paper's network architecture details to the codebase they provided.
Currently, I'm stuck on understanding how they are trimming the VIT blocks when using the modified ControlNet adapter with ZeroFST connector at the code level. They seem to use GLVControl but no VIT component and network trimming that I can spot in the codebase.
I sent an email to one of the authors last week. If I don't hear back, I plan to follow up with more specific questions this week. (Also check this issue here) I'm playing with the code in my repo atm here
If interested would you want to take a second look at their code and share your thoughts?
@ihkap11 are you following the paper and trying to code it? Why not just make a wrapper around what they have? It's pytorch after all, no? I haven't read the paper much in depth, but was able to run SUPIR locally!
@austinfujimori you should take a look if you're free 🙂
Hey @landmann, I'm finding it hard to map a few components from the paper's network architecture details to the codebase they provided.
Currently, I'm stuck on understanding how they are trimming the VIT blocks when using the modified ControlNet adapter with ZeroFST connector at the code level. They seem to use GLVControl but no VIT component and network trimming that I can spot in the codebase.
I sent an email to one of the authors last week. If I don't hear back, I plan to follow up with more specific questions this week. (Also check this issue here) I'm playing with the code in my repo atm here
If interested would you want to take a second look at their code and share your thoughts?
Hi, I tried don't load the VIT ckpt, and it has no influence !!!
@ihkap11 are you following the paper and trying to code it? Why not just make a wrapper around what they have? It's pytorch after all, no? I haven't read the paper much in depth, but was able to run SUPIR locally!
@austinfujimori you should take a look if you're free 🙂
the "ZeroSFT" is to replace the concat[hidden_states, res_sample] in AttnUpBlock and CrossAttnUpBlock, so we can't use diffusers's sdxl pipeline to implement it.
@ihkap11 are you following the paper and trying to code it? Why not just make a wrapper around what they have? It's pytorch after all, no? I haven't read the paper much in depth, but was able to run SUPIR locally! @austinfujimori you should take a look if you're free 🙂
the "ZeroSFT" is to replace the concat[hidden_states, res_sample] in AttnUpBlock and CrossAttnUpBlock, so we can't use diffusers's sdxl pipeline to implement it.
because the different between the sgm and diffusers arch, its difficult
@CuddleSabe would you like to connect on discord to discuss SUPIR and possibly collaborate in figuring out how to support it in diffusers? You can find me by the username tortillachips11.
@CuddleSabe would you like to connect on discord to discuss SUPIR and possibly collaborate in figuring out how to support it in diffusers? You can find me by the username tortillachips11.
well, cant use discord. I write a train script and the model file to train sd1.5 supir. however, I cant publish it because the Company Confidentiality Law
@CuddleSabe would you like to connect on discord to discuss SUPIR and possibly collaborate in figuring out how to support it in diffusers? You can find me by the username tortillachips11.
in sgm, it like this <img width="398" alt="截屏2024-04-10 15 34 02" src="https://github.com/huggingface/diffusers/assets/61224076/fb622363-7940-4f5c-91ed-29cf9072f21a">
the "hs" is the res_sample from the down_blocks. https://github.com/Stability-AI/generative-models/blob/fbdc58cab9f4ee2be7a5e1f2e2787ecd9311942f/sgm/modules/diffusionmodules/openaimodel.py#L849
but in diffusers, it like this the "res_sample" in diffusers equal the "hs" in sgm
so you need to rewrite the AttnUpBlock2D 、CrossAttnUpBlock2D and UNetMidBlock2DCrossAttn in diffusers
Any progress👀?
hi @ihkap11, have news?
Hey! I tried but couldn't get this working. Feel free to take over the implementation for this Issue.
Hey! I tried but couldn't get this working. Feel free to take over the implementation for this Issue.
But do you have a branch where we can continue where you left off? I might try this after I finish a project I'm involved with.
Cc: @asomoza
Just in case, this is not an easy task, everything is in the sgm format so there's a lot of conversion involved. It requires a deep understanding of the original code and diffusers.
Probably the best choice here is to start as a research project and convert all the sgm code to diffusers, and then when stuck, get help from the maintainers and the community.
accoding to the paper and the Comfyui will impliment below:
SUPIR MODEL_LOADER -> SUPIR_MODEL, SUPIR_vae
--
SUPIR_FIRSTSTAGE Denoiser : take Low quality image in and blur or smooth image out and it`s latent
3 SUPIR-controlnet : which take latents in and time stpes in , generate controlnet residuals dowsamples and midsample out
4 An hacked Unet which modify the connector of each dow and up blocks use ZeroSFT
Model/Pipeline/Scheduler description
SUPIR is a super-resolution model that looks like it produces excellent results
Github Repo: https://github.com/Fanghua-Yu/SUPIR
The model is quite memory intensive, so the optimisation features available in diffusers might be quite helpful in making this accessible to lower resource GPUs.
Open source status
Provide useful links for the implementation
No response