kuprel / min-dalle

min(DALL·E) is a fast, minimal port of DALL·E Mini to PyTorch
MIT License
3.48k stars 256 forks source link

Changing image size? #72

Closed patrickjonesdotca closed 1 year ago

patrickjonesdotca commented 1 year ago

Would love to know where to change the code to allow for larger image sizes than 256x256. Better yet would be the ability to change them from the Colab.

kuprel commented 1 year ago

Some people are using this model for upscaling: https://replicate.com/jingyunliang/swinir

neverix commented 1 year ago

It's possible to do a sliding window on image tokens, I have an implementation for ruDALL-E. We can go up to 1024x512

kuprel commented 1 year ago

How does it look?

patrickjonesdotca commented 1 year ago

https://replicate.com/jingyunliang/swinir

I've looked at this but, it seems to lose a significant amount of details. I use a lot of photographic prompt modifiers and they end up looking smeared

iScriptLex commented 1 year ago

@neverix, ruDALL-E doesn't use pass-through recurrent attention, so the result depends only on the input tokens sequence. But DALL-E mini uses attention state as parameter, and this attention context changes recurrently during the generation. So I doubt it can process sliding windows effectively.

iScriptLex commented 1 year ago

There are many ways to increase image size.

  1. You can just use ImageMagick with Jinc filter: convert input.png -filter jinc -resize 512 output.png Source image: img

Result: img_j

  1. Use ffmpeg with xbr filter: ffmpeg -i input.png -vf "xbr=2" output.png Result: img_x

  2. Use any VQGAN model that supports decoding tokens into images of different sizes (just encode->decode to double size). Result: img_out

  3. Use RealESRGAN. They published several models and even compiled NCNN binaries for Windows and Linux, so you can run upscaler from command line even without any Python or CUDA environment. Result: img_r2

neverix commented 1 year ago

@iScriptLex Well, I implemented the functional caching so I would know :sweat_smile:. It's passed around in a similar way, and nothing needs to be changed in the current DALL-E mini codebase to incorporate it (so it can be like a colab notebook).

But it is true that it's not as good as the one without caching, but that can only happen with #74

neverix commented 1 year ago

@kuprel Here's a sample 384px generation without upscaling image