I've just run into an odd issue with FSDP & RewardTrainer. It seems then when using FSDP, the output of the (sequence classification) model's forward function isn't as expected.
Normally, it returns a SequenceClassifierOutputWithPast where logits contains a tensor with the logits, and loss is empty or contains some sort of generator object..
When using FSDP, I'm getting a dict inside the loss field (and oddly enough that dict again contains a single key logits, althouh that's not the issue).
Not sure why this happens, but the net effect is that when the RewardTrainer tries to get the logits through model(...)[0] (see here), in the non-FSDP case it gets the logits, while in the FSDP case it gets the dict from the now non-emptyloss field, and then fails a few lines later.
Two questions:
This is easily fixed by doing model(...)["logits"] instead. Any problem with doing that?
Purely out of curiosity, does anyone know why this behaves differently with FSDP?
To reproduce: Run examples/scripts/reward_modeling.py with accelerate + FSDP.
I've just run into an odd issue with FSDP & RewardTrainer. It seems then when using FSDP, the output of the (sequence classification) model's
forward
function isn't as expected. Normally, it returns aSequenceClassifierOutputWithPast
wherelogits
contains a tensor with the logits, andloss
is empty or contains some sort of generator object.. When using FSDP, I'm getting adict
inside the loss field (and oddly enough that dict again contains a single keylogits
, althouh that's not the issue).Not sure why this happens, but the net effect is that when the RewardTrainer tries to get the logits through
model(...)[0]
(see here), in the non-FSDP case it gets the logits, while in the FSDP case it gets the dict from the now non-emptyloss
field, and then fails a few lines later.Two questions:
model(...)["logits"]
instead. Any problem with doing that?To reproduce: Run
examples/scripts/reward_modeling.py
with accelerate + FSDP.forward
output in a single process:And in FSDP: