horseee / learning-to-cache

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
57 stars 5 forks source link

是否有训练loss曲线可以提供参考?Is there a training loss curve that can provide reference? #2

Open haowen-han opened 3 days ago

haowen-han commented 3 days ago

我在训练的时候发现对于每个sample,用cache和不用cache的output的差距随着denoising的step增加而增加,如下图(denoising的步数为56),不知道这种output loss是不是对的?然后我训出来效果也很不好,没办法像readme.md里面的gif那么完美,不知道能不能提供一下训练的loss来参考一下?

image

I found during training that for each sample, the loss of attn increases with the increase of denoising steps, as shown in the following figure (denoising steps are 56). I wonder if this attn loss is correct? Then I found that the training effect was not very good, and it couldn't be as perfect as the GIF in Readme.md. Can you provide the training loss for reference?

haowen-han commented 3 days ago

另外想请问一下你们做实验的时候,Also, may I ask when you were conducting experiments, https://github.com/horseee/learning-to-cache/blob/d6cc7817842e1353b8d902af263bab030b8478da/DiT/train_router.py#L254 args.l1是否对训练结果十分敏感?Is args.l1 very sensitive to training results?