ruotianluo / DiscCaptioning

Code for Discriminability objective for training descriptive captions(CVPR 2018)
110 stars 21 forks source link

evaluate error: KeyError: 'att_masks', not att_masks in data #1

Closed SkylerZheng closed 6 years ago

SkylerZheng commented 6 years ago

File "/home/jzheng/PycharmProjects/DiscCaptioning/eval_utils.py", line 114, in eval_split data['att_masks'][np.arange(loader.batch_size) * loader.seq_per_img]] KeyError: 'att_masks' There's no att_masks key in the data dict. Neither labels and masks. Am I missing sth?

I'm testing on val2014 dataset.

ruotianluo commented 6 years ago

that's wierd. data should always have these elements. Can you create breakpoints check if they are there in the dataloader get_batch function?

SkylerZheng commented 6 years ago

Thank you very much for your quick response. Speaking of checkpoints. Sorry, I might have a misunderstanding here. So to evaluate, must I first train the model by myself, instead of using the pretrained model, and then test?

ruotianluo commented 6 years ago

No, you can directly use the pretrained model

SkylerZheng commented 6 years ago

I checked, and I found in dataloader, data has all the attributes data['att_feats'], data['att_masks'] = wrap_att(att_batch, seq_per_img)

is self.vocab_size + 1

    data['labels'] = np.vstack(label_batch)
    data['labels'][:, 0] = self.vocab_size + 1
    # generate mask
    nonzeros = np.array(list(map(lambda x: (x != 0).sum()+2, data['labels'])))
    for ix, row in enumerate(mask_batch):
        row[:nonzeros[ix]] = 1
    data['masks'] = mask_batch

    data['gts'] = gts
    data['bounds'] = {'it_pos_now': self.iterators[split], 'it_max': len(self.split_ix[split]), 'wrapped': wrapped}
    data['infos'] = infos

while in dataloaderraw, there are only four elements ( data['fc_feats'] = fc_batch data['att_feats'] = att_batch data['bounds'] = {'it_pos_now': self.iterator, 'it_max': self.N, 'wrapped': wrapped} data['infos'] = infos)

So I'm a lit confused with those two, which one should I use?

ruotianluo commented 6 years ago

I should have deleted dataloaderaw.py. This should not be used. The model can only be used with precomputed features.

SkylerZheng commented 6 years ago

But when I used the precomputed features as you suggested with the command "bash eval.sh att_d1 test", I got error as follows:

bash eval.sh att_d1 test DataLoader loading json file: data/cocotalk.json vocab size is 9487 DataLoader loading h5 file: data/cocotalk_fc data/cocobu_att data/cocotalk_label.h5 max sequence length in data is 16 read 123287 image features assigned 113287 images to split train assigned 5000 images to split val assigned 5000 images to split test Traceback (most recent call last): File "eval.py", line 146, in vars(opt)) File "/home/jzheng/PycharmProjects/DiscCaptioning/eval_utils.py", line 92, in eval_split data = loader.get_batch(split) File "/home/jzheng/PycharmProjects/DiscCaptioning/dataloader.py", line 137, in get_batch ix, tmp_wrapped = self._prefetch_process[split].get() File "/home/jzheng/PycharmProjects/DiscCaptioning/dataloader.py", line 256, in get self.reset() File "/home/jzheng/PycharmProjects/DiscCaptioning/dataloader.py", line 235, in reset collate_fn=lambda x: x[0])) File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 437, in init batch_sampler = BatchSampler(sampler, batch_size, drop_last) File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/sampler.py", line 124, in init .format(sampler)) ValueError: sampler should be an instance of torch.utils.data.Sampler, but got sampler=[17, 56, 89, 90, ..., 123284] Terminating BlobFetcher

SkylerZheng commented 6 years ago

problem solved, actually is it because the index of the batchsampler is not string, i converted to string manually. i think because our pytorch version are different.

Many thanks