Open wiluen opened 1 month ago
do difference attention head\ different layers matters ?
Hi @wiluen, thanks for your question.
If I understand correctly, you're asking how to determine which parts of the attention weights are more important to preserve, especially in highly sparse scenarios.
In MInference, we don’t perform fine-tuned adjustments. Most heads use the same kernel sparsity rate. However, we replace block sparsity with a higher-budget VS pattern for certain heads, as we found that allocating more resources to these heads can significantly improve performance.
There are several related works exploring this direction, including:
You can evaluate the impact of small sparse attention weight values in different heads from an end-to-end perspective to measure their importance.
I hope this helps!
Describe the issue
I want to ask a general question. When analyzin attention score, I feel that my attention score is quite sparse and their values are also very low. I cannot obtain any valuable information, such as more attention on what kinds of tokens. Considering that a model has n layers and m and attention head, how can I gain some valuable insights? my task is to extracting important information from the input I provide