Closed yxinli92 closed 4 years ago
Hi Xinli,
Thanks for reporting.
I installed the env on another machine with 1080Ti and I couldn't reproduce the problem after training it for 6 epochs.
I also found that conda
saves the spacy model in the environment under pip
package but fails to install it and all other packages which are expected to be installed after (torchtext
in our case). I fixed it in 7873bea
.
Anyway, let's see why do you have such a problem. It seems that it is related to text-processing parts. Please share:
torchtext
, spacy
are you using?Assuming the problem was local. Please reopen if you think otherwise and provide more details.
Hi @yxinli92, cc: @v-iashin I tried to train the model on my own and also stumble across this problem. I noticed that in the latest version of PyTorch/text module there is an issue with the unknown token being used. Please refer PyTorch/Text Unknown token for more details. In short, if you specify unknown token explicitly at least I don't reproduce this issue,
self.ASR_SUBTITLES_FIELD = data.ReversibleField( tokenize='spacy', init_token=self.start_token, eos_token=self.end_token, pad_token=self.pad_token, lower=True, batch_first=True, unk_token='<unk>')
If you already solved it, then please ignore it. I just wanted to point the root cause for other if they stumble across this problem.
@VP-0822 This is a valuable comment. Thanks for sharing.
Hi Vladimir! Hope you are doing well.
I was running your main.py script. There is the following error saying KeyError. Am I missing something? Thanks a lot!
Traceback (most recent call last): File "main.py", line 572, in
main(cfg)
File "main.py", line 281, in main
cfg.use_categories
File "/home/tuf72841/MDVC/epoch_loop/run_epoch.py", line 336, in validation_next_word_loop
for i, batch in enumerate(tqdm(loader, desc=f'{time} {phase} ({epoch})')):
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/tqdm/std.py", line 1127, in iter
for obj in iterable:
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 346, in next
data = self.dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/tuf72841/MDVC/dataset/dataset.py", line 443, in getitem
caption_data = next(self.caption_loader_iter)
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/iterator.py", line 156, in iter
yield Batch(minibatch, self.dataset, self.device)
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/batch.py", line 34, in init
setattr(self, name, field.process(batch, device=device))
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/field.py", line 237, in process
tensor = self.numericalize(padded, device=device)
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in numericalize
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
File "/home/tuf72841/.conda/envs/mdvc/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
KeyError: 'stairclimber'