krasserm / fairseq-image-captioning

Transformer-based image captioning extension for pytorch/fairseq
Apache License 2.0
312 stars 55 forks source link

Is BPE applied in the caption dataset for sequence-level training #16

Closed Beanocean closed 3 years ago

Beanocean commented 3 years ago

The BPE is applied in the dictionary and the teacher-forcing training (e.g.: train-captions.en.{bin,idx}), while in the scst stage, only normal tokenizer is applied, no BPE is applied in the train-captions.tok.json. will this cause inconsistency?

krasserm commented 3 years ago

Good question but I don't think this will cause inconsistencies. Encoding sentences (with the captioning encoder) and computing CIDEr scores (with Cider()) is completely separated and they don't rely on a common vocabulary. I rather wanted to compute CIDEr scores during SCST that are consistent with the CIDEr scores computed during evaluation which requires a PTB tokenizer.

Closing this ticket for now. If you still see an issue using a different tokenizer please reopen, thanks!

Beanocean commented 3 years ago

@krasserm Thanks for your reply, I found the code which decodes the outputs of the model into a string sentence. Then the string sentence is compared with the sample['target'] to obtain the CIDEr reward, is this right?

krasserm commented 3 years ago

Yes, that's right.