Closed chinmay5 closed 5 years ago
Hi, no need to mask, just input your sequence and keep the hidden-states of the top tokens that correspond to your ingredients.
If your ingredients are not in the vocabulary, they will be split by the tokenizer in sub-word units (totally fine). Then, just use as a representation the mean or the max of the representations for all the sub-word tokens in an ingredient (ex torch.mean(output[0, 1:3, :], dim=1)
if your ingredient word is made of tokens number 1 and 2 in the first example of the batched input sequence).
Hi, no need to mask, just input your sequence and keep the hidden-states of the top tokens that correspond to your ingredients.
If your ingredients are not in the vocabulary, they will be split by the tokenizer in sub-word units (totally fine). Then, just use as a representation the mean or the max of the representations for all the sub-word tokens in an ingredient (ex
torch.mean(output[0, 1:3, :], dim=1)
if your ingredient word is made of tokens number 1 and 2 in the first example of the batched input sequence).
I am trying to figure out how BertForMaskedLM actually works. I saw that in the example, we do not need to mask the input sequence "Hello, my dog is cute". But then in the code, I did not see the random masking taking place either. I am wondering, which word of this input sequence is then masked and where is the ground truth provided?
I am only trying to understand this because I am trying to fine tune the bert model where the task also involves predicting some masked word. And I am trying to figure out how to process the input sequence to signal the "[MASK]" and make the model predict the actual masked out word
it seems that there is nothing like "run_pretraining.py" in google-research/bert written in tensorflow and the pretrained model is converted from tensorflow, right?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Has anyone figured out exactly how words in BERT are masked for masked LM, or where this occurs in the code? I'm trying to understand if the masked token is initialized randomly for every single epoch.
That would be related to the training script. If you're using the run_lm_finetuning.py
script, then these lines are responsible for the token masking.
I have a task where I want to obtain better word embeddings for food ingredients. Since I am a bit new to the field of NLP, I have certain fundamental doubts as well which I would love to be corrected upon.
will this lead to overfitting??
)I would really appreciate if someone can point out any issues with my assumptions above.