invoke-ai / InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.
https://invoke-ai.github.io/InvokeAI/
Apache License 2.0
22.78k stars 2.35k forks source link

[Suggestion] better option for upscaling images #577

Closed BookWyrm114 closed 1 year ago

BookWyrm114 commented 1 year ago

Is your feature request related to a problem? Please describe. This fork is amazing, and it's what I use most of the time, but for generating high-resolution images https://github.com/jquesnelle/txt2imghd is much better, but doesn't have all the options of this fork Describe the solution you'd like an implementation of the system https://github.com/jquesnelle/txt2imghd uses to make higher-res images (generate image, upscale to desired size, run img2img on smaller pieces of the image, blend it all back together), which would allow us to generate much better-looking high-res images at the cost of faster generation speeds

Describe alternatives you've considered just using txt2imghd, or using the built-in GFPGAN and Real-ESRGAN Support, which is great for things like seamless textures, but for other prompts, like landscapes, it doesn't end up nearly as good as txt2imghd's solution.

Additional context upscaled landscape image with this fork: 000017 2927298014

example of a txt2imghd image: initial image:

00000 upscale: 00000u after img2img: 00000ud

example 2: 00031 00031u 00031ud

Any-Winter-4079 commented 1 year ago

I tried that option with the https://github.com/lowfuel/progrock-stable repo and it caused me issues with some inputs (e.g. if you say person, it will be incentivized to create a person in every tile). I haven't looked at the repo you mention or how they implement it, though, so I can't judge. As a feature, I think it's worth exploring!

BookWyrm114 commented 1 year ago

Yeah, I've run into that issue too sometimes, but with txt2imghd they have a "--strength" argument, which decides the strength for noising/unnoising, on default values you won't run into that issue thankfully, but 0.5-0.6 and higher sometimes causes that issue.

n00mkrad commented 1 year ago

Embiggen (=txt2imghd) is being implemented on the dev branches right now.

blessedcoolant commented 1 year ago

General upscaling and img2imghd are two very different features.

Upscaling is just basically enlarging the existing image with the help of AI models. In regards to that, there are very few models at the moment that provide better general results than ESRGAN. And we have that implemented.

As for img2imghd, this a feature where the canvas is enlarged, the image is split into tiles and then resampled with the diffusion model per each tile hence providing a larger image with seemingly more detail.

We have this feature too. Currently in development branch and it's called embiggen. Feel free to check it out. Works great. It would have made the latest release but stuff got delayed to the last day and we pushed to the next release.

if you don't want to try the development branch, you can make use of this feature with the help of embiggen in the next release.

BookWyrm114 commented 1 year ago

Closing the issue since embiggen is now in the latest release. (Also yay 2.0.0 release!)