v-iashin / MDVC

PyTorch implementation of Multi-modal Dense Video Captioning (CVPR 2020 Workshops)
https://v-iashin.github.io/mdvc
142 stars 19 forks source link

KeyError in validation_next_word_loop when running main.py #6

Closed yxinli92 closed 4 years ago

yxinli92 commented 4 years ago

Hi Vladimir! Hope you are doing well.

I was running your main.py script. There is the following error saying KeyError. Am I missing something? Thanks a lot!

Traceback (most recent call last): File "main.py", line 572, in main(cfg) File "main.py", line 281, in main cfg.use_categories File "/home/tuf72841/MDVC/epoch_loop/run_epoch.py", line 336, in validation_next_word_loop for i, batch in enumerate(tqdm(loader, desc=f'{time} {phase} ({epoch})')): File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/tqdm/std.py", line 1127, in iter for obj in iterable: File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 346, in next data = self.dataset_fetcher.fetch(index) # may raise StopIteration File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/tuf72841/MDVC/dataset/dataset.py", line 443, in getitem caption_data = next(self.caption_loader_iter) File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/iterator.py", line 156, in iter yield Batch(minibatch, self.dataset, self.device) File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/batch.py", line 34, in init setattr(self, name, field.process(batch, device=device)) File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/field.py", line 237, in process tensor = self.numericalize(padded, device=device) File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in numericalize arr = [[self.vocab.stoi[x] for x in ex] for ex in arr] File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in arr = [[self.vocab.stoi[x] for x in ex] for ex in arr] File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in arr = [[self.vocab.stoi[x] for x in ex] for ex in arr] KeyError: 'stairclimber'

v-iashin commented 4 years ago

Hi Xinli,

Thanks for reporting.

I installed the env on another machine with 1080Ti and I couldn't reproduce the problem after training it for 6 epochs.

I also found that conda saves the spacy model in the environment under pip package but fails to install it and all other packages which are expected to be installed after (torchtext in our case). I fixed it in 7873bea.

Anyway, let's see why do you have such a problem. It seems that it is related to text-processing parts. Please share:

  1. When do you get this error? How many epochs have you run it for?
  2. Which version of torchtext, spacy are you using?
v-iashin commented 4 years ago

Assuming the problem was local. Please reopen if you think otherwise and provide more details.

VP-0822 commented 4 years ago

Hi @yxinli92, cc: @v-iashin I tried to train the model on my own and also stumble across this problem. I noticed that in the latest version of PyTorch/text module there is an issue with the unknown token being used. Please refer PyTorch/Text Unknown token for more details. In short, if you specify unknown token explicitly at least I don't reproduce this issue,

self.ASR_SUBTITLES_FIELD = data.ReversibleField( tokenize='spacy', init_token=self.start_token, eos_token=self.end_token, pad_token=self.pad_token, lower=True, batch_first=True, unk_token='<unk>')

If you already solved it, then please ignore it. I just wanted to point the root cause for other if they stumble across this problem.

v-iashin commented 4 years ago

@VP-0822 This is a valuable comment. Thanks for sharing.