ofirpress / attention_with_linear_biases

Code for the ALiBi method for transformer language models (ICLR 2022)
MIT License
497 stars 38 forks source link

How to perform sliding window evaluation? #10

Closed chijames closed 2 years ago

chijames commented 2 years ago

Hi,

Apologies if this was already stated somewhere, but may I know how we could perform the sliding window evaluation as described in the paper Appendix B and Table 7? The current example in README seems to support only non-overlapping evaluation.

Thanks!

chijames commented 2 years ago

It seems like we can just change the context-window argument to S? For example, we can use the following command to reproduce the test split result in Table 7: l=3072; fairseq-eval-lm data-bin/wikitext-103/ --path wt103/checkpoint_best.pt --sample-break-mode none --gen-subset test --max-sentences 1 --model-overrides "{'max_tokens':$l, 'tokens_per_sample':$l, 'max_target_positions':$l}" --tokens-per-sample $l --max-tokens $l --max-target-positions $l --context-window 512

ofirpress commented 2 years ago

Oh wow I apologize, we really didn't mention in the README how to do sliding window eval.

Yes you just need to set the context-window arg, but you should set it to L-1. So in your example it would have to be 3071. All examples in the paper use a sliding window that slides by 1 token every time, and to get that, you have to set context-window to be L-1.

Thanks, and please tell me if it doesn't work!