Open dvIdol opened 2 years ago
@patil-suraj IIRC you had plans for an SR example too? I might not have bandwidth in the next few weeks, but can work on SR after, if it's not high on your list.
Yes, that's on my todo list. But if anyone is interested feel free to open a PR, happy to help :)
I'm interested in this too, and it's becoming relevant for the on-going fast.ai course. I might have some time to start working on this in a few days and/or help @anton-l and @patil-suraj when they do :)
That's awesome Pedro! I'm looking at implementing SR3 https://iterative-refinement.github.io/ for this task.
My thought exactly :)
@patil-suraj @pcuenca I can spend time implementing SR example this weekend (PyTorch & Flax).
Reopening this issue as it's related to training super-res model.
Also, thanks to @duongna21 a super resolution model is now available in diffusers
from diffusers import LDMSuperResolutionPipeline
from PIL import Image
pipe = LDMSuperResolutionPipeline.from_pretrained('CompVis/ldm-super-resolution-4x-openimages')
pipe.to('cuda')
img = Image.open('low_resolution.jpg')
super_img = pipe(img, num_inference_steps=100, eta=1)
super_img['images'][0]
@patil-suraj Hi, How is it going? There's an unofficial repo with much attention: https://github.com/Janspiry/Image-Super-Resolution-via-Iterative-Refinement
That's awesome Pedro! I'm looking at implementing SR3 https://iterative-refinement.github.io/ for this task.
Haven't really started anything yet, thanks for sharing the repo.
Hi! I am interested in using SR3 for the work on my master's thesis, and would also love to contribute to the implementation!
I also wanted to share the repo to OpenAI's guided diffusion: Guided Diffusion. SR3 uses the improved version of DDPM as proposed by OpenAI in the linked repo. I think you might also find this useful for the implementation of SR3 or even it's follow up model, Palette. Here is a link to the paper that introduced Palette: Image-to-Image Diffusion Models and the authors' website
@basab-gupta, if you are interested, feel free to start working on it; happy to help with the PR :)
We can add this example under examples/research_projects
directory.
@patil-suraj Thank you! Do you mean add a link to the guided diffusion repo to examples/research_projects
?
I'll try to get started with the implementation. Also, feel free to HMU in case anyone else is interested on working on this together :)
I meant to add a training script leveraging diffusers
.
I will join you on this script @basab-gupta !
Hi! @patil-suraj, Marc @marc-gav and I have a small update for you. We managed to set a training script up. However, the loss plateaus out after a point when we run the training. We were thinking of adding a few modifications from Improved Denoising Diffusion Probabilistic Models and were wondering what you thought of them?
Also open to any other suggestions that could help us potentially fix this issue.
Hi @patil-suraj. We have an update for you. We managed to fix our problem with the loss from our previous post. We now have a working implementation of the SR3 model that uses the HF diffusers. Here are some preliminary results from our experiments. Preliminary Results of 8x super resolution
The results however, still do not look quite as good. We are currently working on tuning the hyperparameters to optimize the results and will hopefully get back to you soon with more positive updates :)
Very cool!
@patrickvonplaten Danke :)
Hi @basab-gupta @marc-gav ! Thanks for your contribution. There's a problem about sr3: As is shown in Fig 12. of the paper, SR3 (Other Google diffusion models) uses noise level sampling during training, it enables the use of different noise schedules during the inference. But I always get noisy output with less testing timestep than training timestep. Did you do some experiments about the different inference timesteps?
Hi @ElliotQi! We are still working on the inference script to make sure that it allows us to vary the noise schedule and number of steps separately from the ones used in training. Unfortunately, because these models take a while to train, our progress has been a bit slow.
Regarding your question, have you tried adjusting the values of $\beta{0}, \beta{N}$, and $N$ during inference? To my understanding, the authors of SR3 fix $N$ (the number of reverse steps) at 100 and then do a hyperparameter sweep to find the best combination of beta values. They use FID scores of their validation dataset to optimize the hyperparameters. We will let you know once we've made some progress on our inference script.
The authors of SR3 use the noise conditioning described in the Wavegrad paper which is another diffusion model published by the Google Brain team and is used for voice synthesis. I came across this useful repository that has a script to tune the Wavegrad model to find the best inference schedule. Maybe you could take a look at that? Alternatively, I believe you could also use something like Optuna or WanDB to do the hyperparameter tuning for you.
@basab-gupta Danke! :) I tested several values of beta, but no one got good results. I'm still trying some hyperparameters for better performance, thanks for sharing the repo of Wavegrad. In fact, I noticed that Deblurring paper used this continuous noise schedule to achieve Perception-Distortion trade-off. It's so amazing that I'm tuning my model to reproduce this. Thanks for your advice~
Hello.
I am very interested in the unconditional image generation pipelines. Like in this example here: https://github.com/huggingface/diffusers/tree/main/examples/unconditional_image_generation
I have trained a network that is 128x128 and it gives very good results for what I need. However the resolution is very low.
The main diffusers readme mentions a super resolution diffusion model that comes after the low resolution model. How do I make this model? There are no examples and everything seems to be turning to stable diffusion. Is there any guide for how to train a low to high resolution diffusion model?
Thank you for making such a library, it is very good.