horseee / learning-to-cache

[NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
71 stars 5 forks source link

The issue of Remove Ratio. #3

Closed CharvinMei closed 3 days ago

CharvinMei commented 6 days ago

I think this is a good article. However, I have a question for you. What does the Remove Ratio 248/560 mentioned in Table 9 refer to—does it mean the number of Blocks removed or the number of attention layers or MLP layers removed within a single Block?

horseee commented 5 days ago

Hi. Thanks for your question.

560 is the total number of blocks (including attention and MLP) in the cache steps. For example, if we have 10 steps to be cached, then we would have 10(steps) 28(layers) 2(attention for 1 and mlp for 1) = 560. And 248 is calculated as the blocks removed here.

CharvinMei commented 3 days ago

Thank you for your answer.