Yet, this setup is not really standard/well-know and so the default should be no normalization, with a proper documentation on using normalization and its potential benefits. This would avoid confusion and possible sub-optimal results if the user does not normalize the teacher scores on its end aswell.
Following JaColBERTv2.5, I added the normalization of the scores for distillation.
Yet, this setup is not really standard/well-know and so the default should be no normalization, with a proper documentation on using normalization and its potential benefits. This would avoid confusion and possible sub-optimal results if the user does not normalize the teacher scores on its end aswell.