Open wac81 opened 1 year ago
can't train reward model with batch
seq, prompt_mask, labels = next(train_loader) loss = reward_model(seq, prompt_mask = prompt_mask, labels = labels) accelerator.backward(loss / GRADIENT_ACCUMULATE_EVERY)
i set this but i get error from code, check source code , found out this:
if self.binned_output: return F.mse_loss(pred, labels) return F.cross_entropy(pred, labels)
cross_entropy DO NOT support multi trainset. i change to mse_loss ,still error.
how i compute loss from multi trainset , like batch size set 8 ,
reward model doesn't need training.
Are you serious?
how to explain README example?
can't train reward model with batch
i set this but i get error from code, check source code , found out this:
cross_entropy DO NOT support multi trainset. i change to mse_loss ,still error.
how i compute loss from multi trainset , like batch size set 8 ,