pmichel31415 / are-16-heads-really-better-than-1

Code for the paper "Are Sixteen Heads Really Better than One?"
MIT License
165 stars 14 forks source link

about the params: --raw-text and --transformer-mask-heads #8

Open LiangQiqi677 opened 3 years ago

LiangQiqi677 commented 3 years ago

Hi ! @pmichel31415 1.In are-16-heads-really-better-than-1/experiments/MT/prune_wmt.sh you have the --raw-text $EXTRA_OPTIONS, and I don't know the meaning. Can you tell me its explanation and how to use it? It is the origin ref text or something?

  1. I don't know how to use the --transformer-mask-heads . Can you show me an example?