lyeoni / nlp-tutorial

A list of NLP(Natural Language Processing) tutorials
MIT License
1.37k stars 264 forks source link

Little improvements for right indexes in vocabulary dictionaries #9

Closed datason closed 4 years ago

datason commented 4 years ago

Hi, @lyeoni ! You have written great tutorials. I really appreciate you) We can improve a little bit with one pretty line. Look, please) Here, we fill first key-value items of stoi, itos by special tokens. I suggest insert this line before cycle. special_tokens = filter(lambda x: x is not None, [self.unk_token, self.bos_token, self.eos_token, self.pad_token]) If we don't set value for self.unk_token and set for self.bos_token, then index in dictionary become wrong. So, we need filter None values before. Input vocab = Vocab(body, bos_token='<bos>'); vocab.build(); vocab.stoi; Wrong Output '<bos>': 1 ' ': 1, 'hi': 2, 'bear': 3, ...

lyeoni commented 4 years ago

Hi @datason ! Sincerely thank you for giving me nice + kind comment :) You're absolutely right, and i will fix the line that you mentioned (or if you leave pull request, I will merge). I think of that your comment makes this tutorial better.