是否有训练loss曲线可以提供参考？Is there a training loss curve that can provide reference?

horseee / learning-to-cache

[NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

77 stars 4 forks source link

我在训练的时候发现对于每个sample，用cache和不用cache的output的差距随着denoising的step增加而增加，如下图（denoising的步数为56），不知道这种output loss是不是对的？然后我训出来效果也很不好，没办法像readme.md里面的gif那么完美，不知道能不能提供一下训练的loss来参考一下？

I found during training that for each sample, the loss of attn increases with the increase of denoising steps, as shown in the following figure (denoising steps are 56). I wonder if this attn loss is correct? Then I found that the training effect was not very good, and it couldn't be as perfect as the GIF in Readme.md. Can you provide the training loss for reference?

horseee / learning-to-cache

是否有训练loss曲线可以提供参考？Is there a training loss curve that can provide reference? #2