Is reward_fn equal to log_softmax

microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs

https://aka.ms/GeneralAI

MIT License

3.39k stars 253 forks source link

Is reward_fn equal to log_softmax #222

Closed EganGu closed 1 month ago

EganGu commented 1 month ago

I noticed that the scores in reward_fn is actually equal to logits_i - logsumexp(logits). I think this expression can be calculated directly by log_softmax. Why not use log_softmax?

https://github.com/microsoft/LMOps/blob/5fbf5bcd6e6760fa95aaaf945fb5d9cb033135f6/minillm/minillm/reward.py#L33

t1101675 commented 1 month ago

This is because for models with tensor parallelism, log_softmax should be computed as logits_i - logsumexp(logits). To maintain consistency and compare the results, we also use logits_i - logsumexp(logits) in normal scenarios.

EganGu commented 1 month ago

This is because for models with tensor parallelism, log_softmax should be computed as logits_i - logsumexp(logits). To maintain consistency and compare the results, we also use logits_i - logsumexp(logits) in normal scenarios.

Understood. Thanks for your reply.