lucidrains / DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
MIT License
11.17k stars 1.09k forks source link

Replication of the upscalers #152

Open rom1504 opened 2 years ago

rom1504 commented 2 years ago

Hey, so we got decent versions of the prior and the basic decoder now.

I think the current code is already able to train upscalers but we need more doc for it.

Let's have a upscaler.md explaining

And then train it!

We can also discuss what's the right dataset, but I figure the laion5B subset we call "laion high resolution" could do the trick (it's 170M images in 1024x1024 or bigger)

I understand only the image (and clip image EMB) is needed and no text ?

nousr commented 2 years ago

Here's some relevant sections of the paper for reference while in this thread


image image image image

lucidrains commented 2 years ago

they are also using the BSR degradation used by Rombach et al https://github.com/CompVis/latent-diffusion/tree/e66308c7f2e64cb581c6d27ab6fbeb846828253b/ldm/modules/image_degradation https://github.com/cszn/BSRGAN/blob/main/utils/utils_blindsr.py that I don't have in the repository yet

tempted to just go with Imagen's noising procedure (on top of the blur) and call it a day (it would be a lot simpler)

lucidrains commented 2 years ago

ok, 0.11.0 should allow for the different noise schedules across different unets, as in the paper

after adding the BSR image degradation (or some alternative), i think i'm comfortable giving the repository a 1.0

lucidrains commented 2 years ago

I understand only the image (and clip image EMB) is needed and no text ?

@rom1504 yup, no text conditioning needed, i think it should all be in the image embedding!

YUHANG-Ma commented 2 years ago

Hi all, I am aiming to train the decoder and upsampler. Because the decoder and upsampler have too many parameters, so I decide to train them seperately. I saw in the readme which says the upsampler and the decoder net can be trained seperately. I viewed the code, in my understanding, although I can train them seperately, I need to load the parameters of both unet 0 and unet 1 and change the unet number into 1 to train only unet 1. I don't know if I am right. If so, I couldn't train unet0 and unet 1 in two seperate machines. I am wondering how I could train the decoder net and upsamplers seperately? Best,