why is the mpu/cross_entropy missing a softmax_logits_t

microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs

https://aka.ms/GeneralAI

MIT License

3.71k stars 283 forks source link

why is the mpu/cross_entropy missing a softmax_logits_t #157

Closed 155394551lzk closed 9 months ago

155394551lzk commented 10 months ago

In LMOps/minillm/transformers/src/transformers/mpu /cross_entropy.py/_ParallelSoftCrossEntropyLoss line144 the ceLoss = torch.log(sum_exp_logits) - sum_targets_softmax_logits but the whole ceLoss should = softmax_logits_target * torch.log(sum_exp_logits) - sum_targets_softmax_logits.

Thanks! Q0FXCFUGQF1TWFhXB0xYVV9DBQgBBgYPAwMHATY2Mzk+

t1101675 commented 10 months ago

The CE loss should sum over the vocabulary:

155394551lzk commented 10 months ago

get it!