Closed wpfnlp closed 5 years ago
Can you post your script so I could reproduce the case?
Feel free to re-open the issue if you still have questions.
I come across the same issue, and it only happen when I define my own unk_token and set min_freq >1 at the same time.
here's the code I use:
SRC = data.Field(lower=True, unk_token="my_unk_token") TGT = data.Field(lower=True)
train, val, test = datasets.IWSLT.splits(exts=('.de', '.en'), fields=(SRC, TGT))
SRC.build_vocab(train, min_freq=10)
train_iter = data.BucketIterator(dataset=train, batch_size=64, sort_key=lambda x: data.interleave_keys(len(x.src), len(x.trg)))
batch = next(iter(train_iter))
I am still getting this issue. As @TinaChen95 mentioned, min_freq set to 1 works fine. when min_freq > 2, build_vocab(..) builds vocab as per min_freq, but KeyError is thrown while iterating over BucketIterator.
I think so at least for the issue I am facing I figured out that unk_token needs to be passed in ReversibleField constructor even if you want to use default unk_token. That is because ReversibleField uses ' UNK ' as unk_token, while in Vocab we have 'unk' as unk_token. Since there is already open bug #706 so customization is not possible atm.
torchtext=0.4.0 BUG:
Traceback (most recent call last): File "/Users/weipengfei/workspaces/FastNLPProjects/research01/Intent+SlotFilling01.py", line 112, in
for i, batch in enumerate(train_iter):
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/iterator.py", line 156, in iter
yield Batch(minibatch, self.dataset, self.device)
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/batch.py", line 34, in init
setattr(self, name, field.process(batch, device=device))
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/field.py", line 237, in process
tensor = self.numericalize(padded, device=device)
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in numericalize
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
File "/miniconda3/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
KeyError: None
The same code torchtext=0.3.1 No problem, please tell me what caused it, thank you.