jzhang38 / EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
Apache License 2.0
529 stars 33 forks source link

how to acquire the real whole batch sequenece training loss(reduction_mode=mean) ? #24

Open littttttlebird opened 2 months ago

littttttlebird commented 2 months ago

in the train.py, the loss return from main process is the loss of one sequence block, not the whole sequence loss.

jzhang38 commented 1 month ago

https://github.com/jzhang38/EasyContext/blob/01a936055d3409f1949b3e3c5ca0829951beb410/train.py#L150

It is the whole sequence loss?