lucidrains / PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
MIT License
7.71k stars 669 forks source link

train your reward model issue #37

Open wac81 opened 1 year ago

wac81 commented 1 year ago

can't train reward model with batch

    seq, prompt_mask, labels = next(train_loader)
    loss = reward_model(seq, prompt_mask = prompt_mask, labels = labels)
    accelerator.backward(loss / GRADIENT_ACCUMULATE_EVERY)

i set this but i get error from code, check source code , found out this:

    if  self.binned_output:
        return F.mse_loss(pred, labels)

    return F.cross_entropy(pred, labels)

cross_entropy DO NOT support multi trainset. i change to mse_loss ,still error.

how i compute loss from multi trainset , like batch size set 8 ,

wac81 commented 1 year ago

reward model doesn't need training.

Are you serious?

how to explain README example?