Closed patrickjonesdotca closed 1 year ago
Some people are using this model for upscaling: https://replicate.com/jingyunliang/swinir
It's possible to do a sliding window on image tokens, I have an implementation for ruDALL-E. We can go up to 1024x512
How does it look?
I've looked at this but, it seems to lose a significant amount of details. I use a lot of photographic prompt modifiers and they end up looking smeared
@neverix, ruDALL-E doesn't use pass-through recurrent attention, so the result depends only on the input tokens sequence. But DALL-E mini uses attention state as parameter, and this attention context changes recurrently during the generation. So I doubt it can process sliding windows effectively.
There are many ways to increase image size.
convert input.png -filter jinc -resize 512 output.png
Source image:
Result:
Use ffmpeg with xbr filter:
ffmpeg -i input.png -vf "xbr=2" output.png
Result:
Use any VQGAN model that supports decoding tokens into images of different sizes (just encode->decode to double size). Result:
Use RealESRGAN. They published several models and even compiled NCNN binaries for Windows and Linux, so you can run upscaler from command line even without any Python or CUDA environment. Result:
@iScriptLex Well, I implemented the functional caching so I would know :sweat_smile:. It's passed around in a similar way, and nothing needs to be changed in the current DALL-E mini codebase to incorporate it (so it can be like a colab notebook).
But it is true that it's not as good as the one without caching, but that can only happen with #74
@kuprel Here's a sample 384px generation without upscaling
Would love to know where to change the code to allow for larger image sizes than 256x256. Better yet would be the ability to change them from the Colab.