Closed EganGu closed 1 month ago
This is because for models with tensor parallelism, log_softmax
should be computed as logits_i - logsumexp(logits)
. To maintain consistency and compare the results, we also use logits_i - logsumexp(logits)
in normal scenarios.
This is because for models with tensor parallelism,
log_softmax
should be computed aslogits_i - logsumexp(logits)
. To maintain consistency and compare the results, we also uselogits_i - logsumexp(logits)
in normal scenarios.
Understood. Thanks for your reply.
I noticed that the
scores
inreward_fn
is actually equal tologits_i - logsumexp(logits)
. I think this expression can be calculated directly bylog_softmax
. Why not uselog_softmax
?https://github.com/microsoft/LMOps/blob/5fbf5bcd6e6760fa95aaaf945fb5d9cb033135f6/minillm/minillm/reward.py#L33