Closed jianghaojun closed 2 years ago
https://github.com/layer6ai-labs/xpool/blob/6514cc712f30081108463c5d8d4d6c261a1a4a96/config/all_config.py#L52
In all experiments, we set the number of heads to 1 since for our pooling mechanism it doesn't have the same interpretation as the number of heads in Transformers, and empirically it works best. Thanks!
https://github.com/layer6ai-labs/xpool/blob/6514cc712f30081108463c5d8d4d6c261a1a4a96/config/all_config.py#L52