Open pengyao96 opened 6 months ago
Hi, Thanks for you interest in our work.
To get the mean value, we simply evaluate 100 sequences from RedPajama and record the value of massive activations of each sequence.
In practice, we find no performance difference between using the mean value or the original value. But the original value may vary by each sequence, see Table 2, so it might be hard to justify which original value to use.