Open iarbel84 opened 2 months ago
Hi @iarbel84, thanks for opening this feature request!
Makes sense to me - would you like to open a PR with the suggested change?
cc @SunMarc @muellerzr
Yes, I'll be happy to. A few points for discussion:
Hi @iarbel84, apologies for the delay in response.
Let's get the input from @SunMarc or @muellerzr here.
In general, for
"attention_mask"
is provided, although I think it should be possible to use. You'll probably need to account for whether it's in the 2d format or not.
Feature request
Track only the training, avoiding the count of padding tokens
Motivation
It appears that this metric also includes padding tokens. If one would use example packing, then it really tracks the “correct” number of tokens seen by the model.
However, I can think of two cases where this will not be accurate:
For the first case, a more accurate calculation would be to sum the attention mask. For the second case, I'm not sure how this should be regarded. However, we can consider counting only label tokens !=
-100
Your contribution
Replace lines 2248-2258 in trainer.py (v4.43.4) with the following: