thu-nics / DiTFastAttn

MIT License
71 stars 7 forks source link

Is the compression plan shared across different samples? #2

Closed A-BigBao closed 2 months ago

A-BigBao commented 2 months ago

DiTFastAttn is an excellent work for the community! Is the compression plan shared across different samples?

hahnyuan commented 2 months ago

Thank you for your interest in the DiTFastAttn work. To answer your question, the compression plan is not shared across different samples. The techniques proposed in the paper, such as Window Attention with Residual Caching, Temporal Similarity Reduction, and Conditional Redundancy Elimination, are applied to each individual sample during the inference process.

BTW, I think the sharing across different samples may be difficult because the attention similarity is low.

A-BigBao commented 2 months ago

Thank you for your interest in the DiTFastAttn work. To answer your question, the compression plan is not shared across different samples. The techniques proposed in the paper, such as Window Attention with Residual Caching, Temporal Similarity Reduction, and Conditional Redundancy Elimination, are applied to each individual sample during the inference process.

BTW, I think the sharing across different samples may be difficult because the attention similarity is low.

Thank you for your interest in the DiTFastAttn work. To answer your question, the compression plan is not shared across different samples. The techniques proposed in the paper, such as Window Attention with Residual Caching, Temporal Similarity Reduction, and Conditional Redundancy Elimination, are applied to each individual sample during the inference process.

BTW, I think the sharing across different samples may be difficult because the attention similarity is low.

Does it mean that if we want to adopt DitFastAttn to generate a sample, we should first use greedy methods to search the compression plan of the specific sample?

walkerning commented 2 months ago

Thank you for your interest in the DiTFastAttn work. To answer your question, the compression plan is not shared across different samples. The techniques proposed in the paper, such as Window Attention with Residual Caching, Temporal Similarity Reduction, and Conditional Redundancy Elimination, are applied to each individual sample during the inference process. BTW, I think the sharing across different samples may be difficult because the attention similarity is low.

Thank you for your interest in the DiTFastAttn work. To answer your question, the compression plan is not shared across different samples. The techniques proposed in the paper, such as Window Attention with Residual Caching, Temporal Similarity Reduction, and Conditional Redundancy Elimination, are applied to each individual sample during the inference process. BTW, I think the sharing across different samples may be difficult because the attention similarity is low.

Does it mean that if we want to adopt DitFastAttn to generate a sample, we should first use greedy methods to search the compression plan of the specific sample?

Hey, sorry for the late reply. Instead of searching the compression plan for each specific sample, we use several calibration data samples to decide a compression plan and test the plan on the overall dataset, which is a common practice in model compression techniques.