Closed changsha2999 closed 9 months ago
Hi @changsha2999 ,
The code has gone through many changes. In the past, we used word2vec as an embedding to get a baseline, so the word2index
dictionary was one of the centerpieces of the word to index map. However, in the current version accessible to you, _it is only used in the true mask generation, where a mask with a one to one co-relation to the output tokens is generated ( see ln[9], self.x_mask
and self._sin
)
So in short, yes it can be removed, although you have to change this section to prevent any errors.
Hi Rafiepour, Thanks your answer very much. that's great help for me!
Another small question: the 'first slot label' on the dataset is for the "BOS": BOS does us air fly from ... O O B-airline_name I-airline_name O O ...
that is why on the code:
dataset = [[t[0][1:-1], t[1][1:], t[2]] for t in dataset] -----> t[1][1:] to remove the 'first slot label'
am i right?
Thanks again.
Yes, as i commented above the said line:
#removes BOS, EOS from array of tokens and tags
These tokens should not be counted in the f1 score because they will incorrectly increase the score.
To keep things organized, if you have any other questions, feel free to create a new issue.
Hi Rafiepour, Thanks for your great job! I have a small question of the use of word2index,seems real ids(input_ids) that produced by bert_tokenizer, can i remove the word2index.
thanks!