Replication of the upscalers

lucidrains / DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch

MIT License

11.17k stars 1.09k forks source link

Replication of the upscalers #152

Open rom1504 opened 2 years ago

rom1504 commented 2 years ago

Hey, so we got decent versions of the prior and the basic decoder now.

I think the current code is already able to train upscalers but we need more doc for it.

Let's have a upscaler.md explaining

What is it
How to prepare the dataset
what hyper parameters
command to run the training
expected GPU hours cost

And then train it!

We can also discuss what's the right dataset, but I figure the laion5B subset we call "laion high resolution" could do the trick (it's 170M images in 1024x1024 or bigger)

I understand only the image (and clip image EMB) is needed and no text ?

nousr commented 2 years ago

Here's some relevant sections of the paper for reference while in this thread

lucidrains commented 2 years ago

they are also using the BSR degradation used by Rombach et al https://github.com/CompVis/latent-diffusion/tree/e66308c7f2e64cb581c6d27ab6fbeb846828253b/ldm/modules/image_degradation https://github.com/cszn/BSRGAN/blob/main/utils/utils_blindsr.py that I don't have in the repository yet

tempted to just go with Imagen's noising procedure (on top of the blur) and call it a day (it would be a lot simpler)

lucidrains commented 2 years ago

ok, 0.11.0 should allow for the different noise schedules across different unets, as in the paper

after adding the BSR image degradation (or some alternative), i think i'm comfortable giving the repository a 1.0

lucidrains commented 2 years ago

I understand only the image (and clip image EMB) is needed and no text ?

@rom1504 yup, no text conditioning needed, i think it should all be in the image embedding!

YUHANG-Ma commented 2 years ago

Hi all, I am aiming to train the decoder and upsampler. Because the decoder and upsampler have too many parameters, so I decide to train them seperately. I saw in the readme which says the upsampler and the decoder net can be trained seperately. I viewed the code, in my understanding, although I can train them seperately, I need to load the parameters of both unet 0 and unet 1 and change the unet number into 1 to train only unet 1. I don't know if I am right. If so, I couldn't train unet0 and unet 1 in two seperate machines. I am wondering how I could train the decoder net and upsamplers seperately? Best,