Support to Analytic-DPM and DPM-Solver

lhaippp commented 1 year ago

Hi Phil, thanks again for the great job!

Recently, Analytic-DPM and DPM-Solver, proposed optimal analytical solutions and better sampling methods, respectively. Would you mind considering adding them to this repo?

Best regards

lucidrains commented 1 year ago

@lhaippp hey thanks Li! hmm, are you sure dpm-solver works? if it does work, wouldn't stable diffusion be using it?

lhaippp commented 1 year ago

@lucidrains I think it should work, I will try to verify.

lucidrains commented 1 year ago

@lhaippp verifying is a good idea :) most papers do not end up working as advertised, or has some catch

lucidrains commented 1 year ago

@lhaippp hey Li, i actually heard from someone i trust that dpm solver apparently works well

will look into it this week

lhaippp commented 1 year ago

@lucidrains hi Phil, that's awesome! Look forward

Mut1nyJD commented 1 year ago

I've got an implementation of DPM-Solver++ in combo with Elucidated Diffusion working with this repo. If interested I can make a pull-request

lucidrains commented 1 year ago

@Mut1nyJD omg, that would be amazing! so dpm-solver++ is working for you then?

lucidrains commented 1 year ago

@Mut1nyJD ~i'm confused, how could dpm-solver++ be compatible with elucidated sampling?~ i should just read the paper

Mut1nyJD commented 1 year ago

@Mut1nyJD omg, that would be amazing! so dpm-solver++ is working for you then?

Yeah it works at least on the model I've trained. Katherine Crowson made it compatible with Elucidated sampling what she calls k-diffusion (k for Karras).

I will clean it up and submit a pull request.

lucidrains commented 1 year ago

@Mut1nyJD thanks for relaying this! ugh, i'm missing out on the latest by not being on discord

Mut1nyJD commented 1 year ago

@lucidrains

No problem you're welcome! It is hard to keep up with all the developments.

What I am personal curious is would be the addition of flash attention do you think that would work? And would it have any benefit to linear attention that is mostly used right now

And the other one is Self-Attention-Guidance (https://github.com/PnDong/Self-Attention-Guidance)

lucidrains commented 1 year ago

@Mut1nyJD yup, flash attention will be better than linear attention, although you will still face quadratic compute price

without reading the paper, is self attention guidance for better binding to text?

Mut1nyJD commented 1 year ago

@lucidrains Yeah but should be still faster than full attention? Is your flash attention mature enough to try?

Well I guess you could also use the cross attention I am sure they are working on that but this paper/implementation is primarily about the general self attention layers so as far as I understood it is for unsupervised and maybe class supervised diffusion

lucidrains commented 1 year ago

@Mut1nyJD yup, under certain conditions it is faster

my flash attention repository is using cosine similarity, and unfortunately Robin had some failed experiments using it, so i can't recommend it until more validation is done. good news is that flash attention will eventually be offered in pytorch!

oh! 🤦 yes i got confused

just scanned the diagrams of the paper and i like it! thank you for mentioning it, may try it out when i find time

Mut1nyJD commented 1 year ago

@lucidrains

my flash attention repository is using cosine similarity, and unfortunately Robin had some failed experiments using it, so i can't recommend it until more validation is done. good news is that flash attention will eventually be offered in pytorch!

Ok thank you for the feedback. I will have a look at the official pytorch implementation. Although I due tend to be back a version from current main release 😄

oh! 🤦 yes i got confused

just scanned the diagrams of the paper and i like it! thank you for mentioning it, may try it out when i find time

Yeah looks like an interesting not too expansive improvement. That's great I will try my best as well to decipher it let you know if I got something if you haven't beaten me by then 😄

lucidrains commented 1 year ago

@Mut1nyJD yeah, the concept is super interesting (self guidance using emerged self-attention values). i'm trying to think if there is a way language models can benefit :thinking:

lucidrains commented 1 year ago

closing because of https://github.com/lucidrains/denoising-diffusion-pytorch/pull/148

lucidrains commented 1 year ago

@Mut1nyJD if you want to open another issue (as a reminder for me) for self attention guidance, i can take another look at it once i return

i do think there is something important going on in that paper that will benefit all ddpms!

lucidrains / denoising-diffusion-pytorch

Support to Analytic-DPM and DPM-Solver #141