layer6ai-labs / xpool

https://layer6ai-labs.github.io/xpool/
116 stars 9 forks source link

Is the value of hyper-parameter head number “num_mha_heads=1(default)” correct? #8

Closed jianghaojun closed 2 years ago

jianghaojun commented 2 years ago

https://github.com/layer6ai-labs/xpool/blob/6514cc712f30081108463c5d8d4d6c261a1a4a96/config/all_config.py#L52

NoelVouitsis commented 2 years ago

In all experiments, we set the number of heads to 1 since for our pooling mechanism it doesn't have the same interpretation as the number of heads in Transformers, and empirically it works best. Thanks!