Having read the paper and the code I don't understand what is meant by "These judges are lightweight models fine-tuned against a contrastive learning objective.".
The loss function is a cross entropy loss and the task is a classification task (yes/no) for belonging or not belonging in a class. There is no notion in the code of maximizing the separation of these classes in the embeddings space. The phrasing adds some confusion since contrastive typically means something else.
Having read the paper and the code I don't understand what is meant by "These judges are lightweight models fine-tuned against a contrastive learning objective.".
The loss function is a cross entropy loss and the task is a classification task (yes/no) for belonging or not belonging in a class. There is no notion in the code of maximizing the separation of these classes in the embeddings space. The phrasing adds some confusion since contrastive typically means something else.