Open littttttlebird opened 2 months ago
in the train.py, the loss return from main process is the loss of one sequence block, not the whole sequence loss.
https://github.com/jzhang38/EasyContext/blob/01a936055d3409f1949b3e3c5ca0829951beb410/train.py#L150
It is the whole sequence loss?
in the train.py, the loss return from main process is the loss of one sequence block, not the whole sequence loss.