Open haowen-han opened 2 months ago
另外想请问一下你们做实验的时候,Also, may I ask when you were conducting experiments, https://github.com/horseee/learning-to-cache/blob/d6cc7817842e1353b8d902af263bab030b8478da/DiT/train_router.py#L254 args.l1是否对训练结果十分敏感?Is args.l1 very sensitive to training results?
我在训练的时候发现对于每个sample,用cache和不用cache的output的差距随着denoising的step增加而增加,如下图(denoising的步数为56),不知道这种output loss是不是对的?然后我训出来效果也很不好,没办法像readme.md里面的gif那么完美,不知道能不能提供一下训练的loss来参考一下?
I found during training that for each sample, the loss of attn increases with the increase of denoising steps, as shown in the following figure (denoising steps are 56). I wonder if this attn loss is correct? Then I found that the training effect was not very good, and it couldn't be as perfect as the GIF in Readme.md. Can you provide the training loss for reference?