huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.37k stars 5.25k forks source link

Super Resolution Diffusion Model #463

Open dvIdol opened 2 years ago

dvIdol commented 2 years ago

Hello.

I am very interested in the unconditional image generation pipelines. Like in this example here: https://github.com/huggingface/diffusers/tree/main/examples/unconditional_image_generation

I have trained a network that is 128x128 and it gives very good results for what I need. However the resolution is very low.

The main diffusers readme mentions a super resolution diffusion model that comes after the low resolution model. How do I make this model? There are no examples and everything seems to be turning to stable diffusion. Is there any guide for how to train a low to high resolution diffusion model?

Thank you for making such a library, it is very good.

patrickvonplaten commented 2 years ago

Related to https://github.com/huggingface/diffusers/issues/146

anton-l commented 1 year ago

@patil-suraj IIRC you had plans for an SR example too? I might not have bandwidth in the next few weeks, but can work on SR after, if it's not high on your list.

patil-suraj commented 1 year ago

Yes, that's on my todo list. But if anyone is interested feel free to open a PR, happy to help :)

pcuenca commented 1 year ago

I'm interested in this too, and it's becoming relevant for the on-going fast.ai course. I might have some time to start working on this in a few days and/or help @anton-l and @patil-suraj when they do :)

patil-suraj commented 1 year ago

That's awesome Pedro! I'm looking at implementing SR3 https://iterative-refinement.github.io/ for this task.

pcuenca commented 1 year ago

My thought exactly :)

duongna21 commented 1 year ago

@patil-suraj @pcuenca I can spend time implementing SR example this weekend (PyTorch & Flax).

patil-suraj commented 1 year ago

Reopening this issue as it's related to training super-res model.

patil-suraj commented 1 year ago

Also, thanks to @duongna21 a super resolution model is now available in diffusers

from diffusers import LDMSuperResolutionPipeline
from PIL import Image

pipe = LDMSuperResolutionPipeline.from_pretrained('CompVis/ldm-super-resolution-4x-openimages')
pipe.to('cuda')

img = Image.open('low_resolution.jpg')
super_img = pipe(img, num_inference_steps=100, eta=1)
super_img['images'][0]
ElliotQi commented 1 year ago

@patil-suraj Hi, How is it going? There's an unofficial repo with much attention: https://github.com/Janspiry/Image-Super-Resolution-via-Iterative-Refinement

That's awesome Pedro! I'm looking at implementing SR3 https://iterative-refinement.github.io/ for this task.

patil-suraj commented 1 year ago

Haven't really started anything yet, thanks for sharing the repo.

basab-gupta commented 1 year ago

Hi! I am interested in using SR3 for the work on my master's thesis, and would also love to contribute to the implementation!

basab-gupta commented 1 year ago

I also wanted to share the repo to OpenAI's guided diffusion: Guided Diffusion. SR3 uses the improved version of DDPM as proposed by OpenAI in the linked repo. I think you might also find this useful for the implementation of SR3 or even it's follow up model, Palette. Here is a link to the paper that introduced Palette: Image-to-Image Diffusion Models and the authors' website

patil-suraj commented 1 year ago

@basab-gupta, if you are interested, feel free to start working on it; happy to help with the PR :)

We can add this example under examples/research_projects directory.

basab-gupta commented 1 year ago

@patil-suraj Thank you! Do you mean add a link to the guided diffusion repo to examples/research_projects?

I'll try to get started with the implementation. Also, feel free to HMU in case anyone else is interested on working on this together :)

patil-suraj commented 1 year ago

I meant to add a training script leveraging diffusers.

marc-gav commented 1 year ago

I will join you on this script @basab-gupta !

basab-gupta commented 1 year ago

Hi! @patil-suraj, Marc @marc-gav and I have a small update for you. We managed to set a training script up. However, the loss plateaus out after a point when we run the training. We were thinking of adding a few modifications from Improved Denoising Diffusion Probabilistic Models and were wondering what you thought of them?

  1. Add a cosine scheduler in the DDPM Scheduler
  2. Letting the model learn the variance and potentially using the hybrid loss
  3. Improvements in the architecture from OpenAI's guided diffusion repo.

Also open to any other suggestions that could help us potentially fix this issue.

basab-gupta commented 1 year ago

Hi @patil-suraj. We have an update for you. We managed to fix our problem with the loss from our previous post. We now have a working implementation of the SR3 model that uses the HF diffusers. Here are some preliminary results from our experiments. Preliminary Results of 8x super resolution

The results however, still do not look quite as good. We are currently working on tuning the hyperparameters to optimize the results and will hopefully get back to you soon with more positive updates :)

patrickvonplaten commented 1 year ago

Very cool!

basab-gupta commented 1 year ago

@patrickvonplaten Danke :)

ElliotQi commented 1 year ago

Hi @basab-gupta @marc-gav ! Thanks for your contribution. There's a problem about sr3: As is shown in Fig 12. of the paper, SR3 (Other Google diffusion models) uses noise level sampling during training, it enables the use of different noise schedules during the inference. But I always get noisy output with less testing timestep than training timestep. Did you do some experiments about the different inference timesteps?

basab-gupta commented 1 year ago

Hi @ElliotQi! We are still working on the inference script to make sure that it allows us to vary the noise schedule and number of steps separately from the ones used in training. Unfortunately, because these models take a while to train, our progress has been a bit slow.

Regarding your question, have you tried adjusting the values of $\beta{0}, \beta{N}$, and $N$ during inference? To my understanding, the authors of SR3 fix $N$ (the number of reverse steps) at 100 and then do a hyperparameter sweep to find the best combination of beta values. They use FID scores of their validation dataset to optimize the hyperparameters. We will let you know once we've made some progress on our inference script.

The authors of SR3 use the noise conditioning described in the Wavegrad paper which is another diffusion model published by the Google Brain team and is used for voice synthesis. I came across this useful repository that has a script to tune the Wavegrad model to find the best inference schedule. Maybe you could take a look at that? Alternatively, I believe you could also use something like Optuna or WanDB to do the hyperparameter tuning for you.

ElliotQi commented 1 year ago

@basab-gupta Danke! :) I tested several values of beta, but no one got good results. I'm still trying some hyperparameters for better performance, thanks for sharing the repo of Wavegrad. In fact, I noticed that Deblurring paper used this continuous noise schedule to achieve Perception-Distortion trade-off. It's so amazing that I'm tuning my model to reproduce this. Thanks for your advice~