Closed A-BigBao closed 4 months ago
Thank you for your interest in the DiTFastAttn work. To answer your question, the compression plan is not shared across different samples. The techniques proposed in the paper, such as Window Attention with Residual Caching, Temporal Similarity Reduction, and Conditional Redundancy Elimination, are applied to each individual sample during the inference process.
BTW, I think the sharing across different samples may be difficult because the attention similarity is low.
Thank you for your interest in the DiTFastAttn work. To answer your question, the compression plan is not shared across different samples. The techniques proposed in the paper, such as Window Attention with Residual Caching, Temporal Similarity Reduction, and Conditional Redundancy Elimination, are applied to each individual sample during the inference process.
BTW, I think the sharing across different samples may be difficult because the attention similarity is low.
Thank you for your interest in the DiTFastAttn work. To answer your question, the compression plan is not shared across different samples. The techniques proposed in the paper, such as Window Attention with Residual Caching, Temporal Similarity Reduction, and Conditional Redundancy Elimination, are applied to each individual sample during the inference process.
BTW, I think the sharing across different samples may be difficult because the attention similarity is low.
Does it mean that if we want to adopt DitFastAttn to generate a sample, we should first use greedy methods to search the compression plan of the specific sample?
Thank you for your interest in the DiTFastAttn work. To answer your question, the compression plan is not shared across different samples. The techniques proposed in the paper, such as Window Attention with Residual Caching, Temporal Similarity Reduction, and Conditional Redundancy Elimination, are applied to each individual sample during the inference process. BTW, I think the sharing across different samples may be difficult because the attention similarity is low.
Thank you for your interest in the DiTFastAttn work. To answer your question, the compression plan is not shared across different samples. The techniques proposed in the paper, such as Window Attention with Residual Caching, Temporal Similarity Reduction, and Conditional Redundancy Elimination, are applied to each individual sample during the inference process. BTW, I think the sharing across different samples may be difficult because the attention similarity is low.
Does it mean that if we want to adopt DitFastAttn to generate a sample, we should first use greedy methods to search the compression plan of the specific sample?
Hey, sorry for the late reply. Instead of searching the compression plan for each specific sample, we use several calibration data samples to decide a compression plan and test the plan on the overall dataset, which is a common practice in model compression techniques.
DiTFastAttn is an excellent work for the community! Is the compression plan shared across different samples?