Open soapmactavish-byf opened 3 weeks ago
We use the OPUS-M (8f) for ablation and vis, as descriped in the paper:
4.3 Ablation study and visualizations
This part details our ablation study and visualizations using the OPUS-M (8f) model.
Sorry for my incomplete expression. When I reproduced the 12e-m-8f model, I achieved even better performance than the ablation experiment in the paper, so I am confused
That makes sense. These ablation studies were conducted quite early, and some tricks used in the repo were not applied.
Tiny? Small? or ......