Our metric computations were using the logits, not probabilities, in my previous pull request (sorry!). I think it's better to have the model return a probability distribution (i.e. torch.sigmoid on the outputs) and then handle everything as probabilities and not logits. So I had to change the loss function too.
An important note is that we've only set up fine-tuning for binary classification tasks like rotten_tomatoes! So if we want to fine-tune and test on a task that has >2 labels, we'll need to change the code a bit.
Our metric computations were using the logits, not probabilities, in my previous pull request (sorry!). I think it's better to have the model return a probability distribution (i.e.
torch.sigmoid
on the outputs) and then handle everything as probabilities and not logits. So I had to change the loss function too.An important note is that we've only set up fine-tuning for binary classification tasks like
rotten_tomatoes
! So if we want to fine-tune and test on a task that has >2 labels, we'll need to change the code a bit.