Provide generation clip guiding script using the prior

lucidrains / DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch

MIT License

10.98k stars 1.07k forks source link

Provide generation clip guiding script using the prior #59

Open rom1504 opened 2 years ago

rom1504 commented 2 years ago

alstro is reporting increased diversity when doing that

example script https://gist.github.com/crowsonkb/a6aef1031a2712241d0c21426f9c2897 that needs

https://github.com/crowsonkb/deep-image-prior
- this needs to be replaced https://github.com/crowsonkb/v-diffusion-pytorch by what we have here
- https://github.com/assafshocher/ResizeRight

this can be an interesting way to evaluate the prior

example of diversity thanks to the diffusion sampling process https://twitter.com/jd_pressman/status/1508868273474920452

lucidrains commented 2 years ago

oh nice! so the prior network is working then for Katherine?

nousr commented 2 years ago

I've got our latest model thrown into the script...and these are the results

can you tell which one is supposed to be a Siberian husky? 😆

the first day of the waters a photo of a Siberian husky dog

rom1504 commented 2 years ago

@lucidrains no not quite, she independently implemented and trained a small one, and for her it worked Apparently we're doing something wrong

lucidrains commented 2 years ago

@rom1504 ahh i see, well it's still good news to hear that the prior works!

lucidrains commented 2 years ago

I've got our latest model thrown into the script...and these are the results

can you tell which one is supposed to be a Siberian husky? 😆

those look like quilts haha

lucidrains commented 2 years ago

do you have a decoder that's conditioned on the clip image embeddings trained?

nousr commented 2 years ago

do you have a decoder that's conditioned on the clip image embeddings trained? @lucidrains

yeah, actually, i was just about to post these results from an incredibly under-trained wikiart prior with clip conditioning...

they look better but we should definitely do a more comprehensive training run (probably on a laionX subset) to do a better comparison as I only trained this model for like an hour last night while testing something else...

the first day of the waters (zion wikiart) the birth of venus (zion wikiart mini)

lucidrains commented 2 years ago

ok cool, i'll keep chipping away at the training code - hopefully by the end of the month people with multiple GPUs can at least train something small scale using only CLI commands (like how my GAN repos are done)

nousr commented 2 years ago

the birth of venus oils on canvas

progress looks good! this is roughly 20k steps in of the latest run mentioned in #29

nousr commented 2 years ago

I though I'd make a quick gist of my modifications to alstro's deep-image-prior code for use with our priors...its still just one big script, but it would be nice to have a slimmed down version that's callable during training for stuff like wandb

https://gist.github.com/nousr/bafb0a417efceb4a9ced4e07f3acadef

For now you'll still need...

to clone https://github.com/crowsonkb/deep-image-prior
to pip install madgrad as that's the default optimizer used. (techincally not required, but probably recommended)
to pip install resize-right
ofc dalle2-pytorch
to (most likely) tinker with the code a bit to get things right for your specific prior

when I get some time I'll try to coordinate with katherine to get the deep-image-prior fork pip-installable so that it can be a bit more plug-n-play.

nousr commented 2 years ago

it would be nice to have a slimmed down version that's callable during training for stuff like wandb

started working on this, i have the basic layout blocked in--just need to debug some stuff and make sure it works as expected

rom1504 commented 2 years ago

this seems almost done, just a little bit more packaging needed

nousr commented 2 years ago

this seems almost done, just a little bit more packaging needed

@rom1504 is the deep_image_prior method still needed/interesting enough to include? How about just uploading a script that uses a small decoder?

if so, here's the update:

I have a pip-installable fork of kat's deep-image-prior branch (needs to be published to pypi).
resize-right is packged by this repo now,

that being said...

I personally feel its not efficient (requires a new deep-image-prior UNet for every prompt).
I've also found it doesn't give the prior a fair evaluation since it can't decode some prompts (that get decoded fine by a proper generator). Even a under-trained decoder can produce results more consistently & efficiently than DIP