luosiallen / latent-consistency-model

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
MIT License
4.27k stars 222 forks source link

About Dreamshaper-V7 and inference time #4

Closed FriedRonaldo closed 10 months ago

FriedRonaldo commented 10 months ago

TL; DR:

Thanks for the great paper!

I have two questions about the paper. Q1) Did you use the dataset, LAION-subset for distilling Dreamshaper-V7? Q2) For the inference time in the graph at the end of README, will LCM have the same speed as DDIM? Or like DPM++, LCM is faster than DDIM? (Might be because of the distillation effect of CFG)


Hi, I read the paper, LCM, and found that it is a well-grounded paper with high reproducibility.

After reading the paper, I have a question about the implementation details.

In the paper, there are several results from the distilled Dreamshaper-V7, however, I cannot find the implementation details on it. For example, for all the quantitative evaluations, the teacher model is the base SD; for the distillation, the LAION-5B-Aesthetic dataset is referred to as the dataset. However, the training dataset for Dreamer-V7 is not described well (and the training dataset for Dreamer-* seems not to be publicly available.). Did you use the same dataset (LAION-subset) for distilling Dreamshaper-V7?

Following the paper, DDIM sampler is adopted for sampling of LCM, then, is the wall-clock time of the inference of LCM the same as the vanilla stable diffusion model with DDIM (I understand that the performance of DDIM with fewer steps exhibits inferior performance. The question is just about the inference time)? Or is it faster than DDIM because of the distillationof CFG?

Thanks!

jojkaart commented 10 months ago

Would you even need a dataset for distilling? Wouldn't you generate it with the teacher model? The only data you'd need is prompts to give to the teacher model, but those could be randomly generated too.

luosiallen commented 10 months ago

TL; DR:

Thanks for the great paper!

I have two questions about the paper! Q1) Did you use the dataset, LAION-subset for distilling Dreamshaper-V7? Q2) For the inference time in the graph at the end of README, will LCM have the same speed as DDIM? Or like DPM++, LCM is faster than DDIM? (Might be because of the distillation effect of CFG)

Hi, I read the paper, LCM, and found that it is a well-grounded paper with high reproducibility.

After reading the paper, I have a question about the implementation details.

In the paper, there are several results from the distilled Dreamshaper-V7, however, I cannot find the implementation details on it. For example, for all the quantitative evaluations, the teacher model is the base SD; for the distillation, the LAION-5B-Aesthetic dataset is referred to as the dataset. However, the training dataset for Dreamer-V7 is not described well (and the training dataset for Dreamer-* seems not to be publicly available.). Did you use the same dataset (LAION-subset) for distilling Dreamshaper-V7?

Following the paper, DDIM sampler is adopted for sampling of LCM, then, is the wall-clock time of the inference of LCM the same as the vanilla stable diffusion model with DDIM (I understand that the performance of DDIM with fewer steps exhibits inferior performance. The question is just about the inference time)? Or is it faster than DDIM because of the distillationof CFG?

Thanks!

Thanks for asking ! 1). Yes, we use the LAION-Aesthetics dataset to distill the Dreamshaper-V7 model for only 4,000 iterations. The results shows LCM superiority on fast convergence and generation quality. 2). LCM is also faster than DDIM, since DDIM just like DPM++ also requires computing the unconditional score and conditional score simultaneously when using the Classifier-Free Guidance (CFG) (doubling the batch size or requires two network forward), which is the major inference speed bottleneck.

luosiallen commented 10 months ago

Would you even need a dataset for distilling? Wouldn't you generate it with the teacher model? The only data you'd need is prompts to give to the teacher model, but those could be randomly generated too.

Generating a dataset with teacher model is highly inefficient, I think. It requires more GPUs hours to generate a dataset than directly training on it.

FriedRonaldo commented 10 months ago

@luosiallen Thanks for the reply! I understand it.


@jojkaart As you mentioned, the data-free distillation might be interesting but, as @luosiallen mentioned, it will demand higher computational cost.

I agree that the data-free distillation is useful and interesting, but I have several concerns as follows.

In addition to the computational cost, I think that the quality of the generated sample in terms of 1) alignment and 2) diversity might be a problem.

For the alignment, the generated samples cannot reflect the text prompt well, then training with these samples can be harmful to the controllability of the resulting model.

In terms of diversity, the model suffers from the mode-dropping issue even if it is trained with NLL loss. We can observe it by measuring the diversity metrics like coverage or recall. Because of the shrunken coverage of the model, the generated samples for the distillation might have smaller support than the original dataset, then the student model might suffer from the same problem or even exhibit worse performance.

Thank you for all the comments writers!

Because the issue is resolved, I close the issue.