eval_metric - Githubissues

ylqi / GL-RG

The code of IJCAI22 paper "GL-RG: Global-Local Representation Granularity for Video Captioning".

MIT License

17 stars 4 forks source link

eval_metric #7

Open a213402010 opened 2 years ago

a213402010 commented 2 years ago

The CIDER metric file is determined. Isn't there a mismatch between index of metric and data of the random Ground_Truth is selected? And I plan to do more experiments. Could you please post the refFile for calculating CIDEr? Thanks

ylqi commented 2 years ago

Please run data/preprocess/compute_ciderdf.py to get the *_*_ciderdf.pkl files. The referring files (e.g., train_videodatainfo.json) are provided in the data/preprocess/input/ folder.

Run this script using following format:

python compute_ciderdf.py   --captions_json output/metadata/${DATASET}_${SPLIT}_proprocessedtokens.json\
                            --output_pkl output/metadata/${DATASET}_${SPLIT}_ciderdf.pkl\
                            --output_words \
                            --vocab_json output/metadata/${DATASET}_${SPLIT}_vocab.json

a213402010 commented 2 years ago

Thanks for your answer！But I still have a question. The code seems to obtain the pkl file first(17 captions were determined), then randomly select 17 captions from GT, and each get_batch different 17 captions is selected. Won't this show a mismatch between metric and captions?

ylqi commented 2 years ago

In get_batch(), data['labels'] returns 17 captions, while data['gts'] returns all captions. Therefore, in train.py, the metric scores for DXE loss or DR reward is calculated on data['gts']. The data['labels'] is only used for XE loss and the Scheduled Sampling during training.

a213402010 commented 2 years ago

Thanks! Your answer helped me a lot! But in train.py, with using dxe but not rl, dxe loss is calculated by bcmrscores. It is confusing to me. Thanks for your answer again!

    if opt.use_it == 1:
bcmrscores = data['bcmrscores']

compute discriminative cross-entropy or discrepant reward

if opt.use_dxe:

use discriminative cross-entropy (DXE)

reward, m_score, g_score = utils.get_discriminative_cross_entropy_scores(model_res, bcmrscores=bcmrscores) loss = rl_criterion(model_res, logprobs, Variable(torch.from_numpy(reward).float().cuda(), requires_grad=False))

ylqi commented 2 years ago

The bcmrscores stores precomputed metric score (BLEU, CIDEr, METEOR or ROUGE_L) of each GT sequence.

In rl_criterion(), the bcmrscores reward is multipled with logprobs, which is defined in: https://github.com/ylqi/GL-RG/blob/560f4566957d7ee14dc44fa29751231028b0e6fb/model.py#L29

a213402010 commented 2 years ago

Yeah, I just think this reward dismatches the seq of input loss = rl_criterion(model_res, logprobs, Variable(torch.from_numpy(reward).float().cuda(), requires_grad=False)) the reward is calculated by bcmrscores , which is determined, but seq(model_res) is chosed at random reward, m_score, g_score = utils.get_discriminative_cross_entropy_scores(model_res, bcmrscores=bcmrscores) bcmrscores is 1300 x 17 matrics and it is determined, but model_res is random, depending on the seq of each selection

return torch.cat([.unsqueeze(1) for in outputs], 1), \ torch.cat([.unsqueeze(1) for in sampleseq], 1), \ torch.cat([.unsqueeze(1) for _ in sample_logprobs], 1) \