Closed flymark2010 closed 6 years ago
We use the auxiliary heads because we follow the design of Zoph et al (2018). It helps with the performance in the first few versions of our experiments, but since then we just sticked with using the aux heads and never bothered doing an ablation study so we cannot quantify how much does it help.
Why do you use a auxiliary heads? Does that affect the performance much?