how to acquire the real whole batch sequenece training loss(reduction_mode=mean) ?

jzhang38 / EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Apache License 2.0

529 stars 33 forks source link

Open littttttlebird opened 2 months ago

littttttlebird commented 2 months ago

in the train.py, the loss return from main process is the loss of one sequence block, not the whole sequence loss.

jzhang38 commented 1 month ago

It is the whole sequence loss?