jacobswan1 / Video2Commonsense

Video captioning baseline models on Video2Commonsense Dataset.
https://asu-active-perception-group.github.io/Video2Commonsense/
56 stars 12 forks source link

The question of computing the token prediction Acc. #11

Open Xiyu-AI opened 12 months ago

Xiyu-AI commented 12 months ago

train.py:

compute the token prediction Acc.

non_pad_mask = cap_labels[:, 1:].ne(Constants.PAD) n_word = non_pad_mask.sum().item() cms_non_pad_mask = cms_labels[:, 1:].ne(Constants.PAD) cms_n_word = cms_non_pad_mask.sum().item() cap_loss /= n_word cms_loss /= n_word

I'm a bit curious about the calculations. When computing the cap_loss and cms_loss, why are they both divided by n_word? And, why isn't cms_loss divided by cms_n_word? I'd appreciate your clarification. Thank you!