invoke-ai / InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.
https://invoke-ai.github.io/InvokeAI/
Apache License 2.0
22.78k stars 2.35k forks source link

Support latest RealESRGAN, with faster and higher quality realesr-general-wdn-x4v3 model #802

Closed n00mkrad closed 1 year ago

n00mkrad commented 1 year ago

Is your feature request related to a problem? Please describe. Currently, RealESRGAN uses the "heavy" 64 MB models. They do not retain much detail and tend to smooth out images too much, making them not feasible for anything realistic.

Describe the solution you'd like Implement latest RealESRGAN with support for the new ~5MB models, which should also be faster: https://github.com/xinntao/Real-ESRGAN/releases/tag/v0.2.5.0

Any-Winter-4079 commented 1 year ago

Honestly, I'd look into using ncnn. At least on M1, the speed difference between source and executable is significant (I remember it was about 49s vs. 2s for 4x). And executables work for Windows and Linux as well. The only thing is ncnn hasn't been updated since April 24th. Their models are .bin. Not sure if we can just use the new .pth I think/guess/hope so? Read somewhere they are interchangeable for Pytorch models. Newest release announcement from 5 days ago is: https://github.com/xinntao/Real-ESRGAN/releases/tag/v0.3.0

ioma8 commented 1 year ago

If I might add to this - I would like to have an option of which realsrgan model to use. There are several and some are specialized on anime and some on photography. And which is better to use depends totally on what kind of image you are generating from SD.

timdesrochers commented 1 year ago

I can vouch for ncnn being ridiculously faster on my old hardware. I have not used it in a while (the smoothing was just impractical), but I can try plugging in the new models.

blessedcoolant commented 1 year ago

It's extremely easy to implement. Takes 2 mins literally.

I actually implemented this on my local repo the other day when I was updating GFPGAN. The new models are significantly faster but I did not go ahead and update it here because I didn't have the time to compare the results in detail. From the few tests I did, the new low weight model seems to be holding up but I felt it needed further testing.

Can you verify that the new model produces results of the same quality as the heavy model? Because the repo itself says there maybe a quality degradation.


Edit: I've done some testing. The heavier model seems to be producing more sharper results in most of the cases. But the difference is not a great deal. I'm torn between the decision.

Additionally, if we are to go with the new model, should I completely scrap the 2x model because it doesn't have a light weight version and just use the 4x instead and manually down scale the image and output it .. ? That way we make the user only load 1 model.

Any-Winter-4079 commented 1 year ago

Couldn't we let the user decide which model to use? Taking one by default, but if specified, take that one

psychedelicious commented 1 year ago

Strongly prefer the code remain un-opinionated with regard to choice of models. User should decide.

blessedcoolant commented 1 year ago

@Any-Winter-4079 @psychedelicious Not possible with the current implementation unless we implement all models. Each model uses a different upsampler architecture so you can't load them by just changing the checkpoint file location.

psychedelicious commented 1 year ago

Ah, dang. Didn't realize that. Sounds like it's easy to implement any given model, but we would need to change e.g. the restoration class to do special handling. Is that right?

blessedcoolant commented 1 year ago

Here are some changes I've been testing.

  1. Update ESRGAN to use the light weight model realesr-general-x4v3.pth. This is a 4x upsampler. Inference time is significantly faster than the older 4x model. Results are very very similar but not 100% the same. It's possible that the light weight model might infer worse than the older model in some very rare cases. I haven't encountered such cases so far.
  2. Remove x2 model completely. Instead let the 2x upscaling happen with the above 4x model is which lightning fast and just manually downscale the image.
  3. Usage of a single model will now permit us to upscale by any arbitrary value. 1, 2, 3, 4, 5, 6, 7 etc. Good results until 4. Anything above is simple upscaling and will inevitably deteriorate in quality as you go higher.

Thoughts on this?

blessedcoolant commented 1 year ago

Ah, dang. Didn't realize that. Sounds like it's easy to implement any given model, but we would need to change e.g. the restoration class to do special handling. Is that right?

Yeah. Adding a new model is just a couple of lines. But we need to add a check for every single model and load the right one accordingly. I don't know if that falls in the scale of this repo. We offer upscaling as a post processing functionality. The default model does that really well. All other variation models more or less do the same thing.

The only other significant model is the anime model. Maybe we can just implement that.

psychedelicious commented 1 year ago

As @Any-Winter-4079 said, on M1, the current upscaling takes AGES. We had tried a different ESRGAN package which was at least an order of magnitude faster, but the implementation was less desirable (IIRC it used subprocess). I have been meaning to raise this as an issue.

Hopefully the new model is faster for us M1 folks - I'd like to test it in case it makes upscaling less doable for us.

blessedcoolant commented 1 year ago

Yeah. I want to avoid the subprocess call. Right now the upscaling is very well integrated into the pipeline and can be called and modified at will. Adding a subprocess call for the upscaling will mess with that.

psychedelicious commented 1 year ago

The only other significant model is the anime model. Maybe we can just implement that.

Supporting the most popular options is a good happy medium.

Yeah. I want to avoid the subprocess call. Right now the upscaling is very well integrated into the pipeline and can be called and modified at will. Adding a subprocess call for the upscaling will mess with that.

For M1 users, upscaling 4x takes something like 45 seconds for a 512x512 image - 512x768 is about 1 minute - while it took only a few seconds on the other implementation. We all are hoping for something faster.

n00mkrad commented 1 year ago

Here are some changes I've been testing.

  1. Update ESRGAN to use the light weight model realesr-general-x4v3.pth. This is a 4x upsampler. Inference time is significantly faster than the older 4x model. Results are very very similar but not 100% the same. It's possible that the light weight model might infer worse than the older model in some very rare cases. I haven't encountered such cases so far.
  2. Remove x2 model completely. Instead let the 2x upscaling happen with the above 4x model is which lightning fast and just manually downscale the image.
  3. Usage of a single model will now permit us to upscale by any arbitrary value. 1, 2, 3, 4, 5, 6, 7 etc. Good results until 4. Anything above is simple upscaling and will inevitably deteriorate in quality as you go higher.

Thoughts on this?

Sounds great to me

blessedcoolant commented 1 year ago

806

Here you go everyone. Give it a shot and tell me how it works for you.

@psychedelicious Let me know the performance on mps. I've been looking at some solutions to this. I can try some stuff if it is really slow for you guys still.

psychedelicious commented 1 year ago

The performance is up to par with the other realesrgan (the one that needed subprocess). Upscaling 4x takes a few seconds. Most excellent!

For reference, I neglected to upgrade realesrgan and got this error:

>> Error loading Real-ESRGAN:
Traceback (most recent call last):
  File "/Users/spencer/Documents/Code/stable-diffusion/./ldm/dream/restoration/realesrgan.py", line 49, in process
    upsampler = self.load_esrgan_bg_upsampler()
  File "/Users/spencer/Documents/Code/stable-diffusion/./ldm/dream/restoration/realesrgan.py", line 30, in load_esrgan_bg_upsampler
    bg_upsampler = RealESRGANer(
TypeError: RealESRGANer.__init__() got an unexpected keyword argument 'dni_weight'

>> Real-ESRGAN Upscaling seed:4088418986 : scale:4x
>> Error running RealESRGAN or GFPGAN. Your image was not upscaled.
local variable 'upsampler' referenced before assignment

I upgraded with pip install realesrgan --upgrade and it worked after that.

blessedcoolant commented 1 year ago

@psychedelicious That's good to hear. The dni_weight does very little. I can actually remove it and it'll work with an older version too. But I think its better to have the latest version of realesrgan anyway.

blessedcoolant commented 1 year ago

This is now live. I'll close this issues. If there's anything else that needs to go with this, raise a new thread. Thank you all for the feedback.