Closed amariucaitheodor closed 1 year ago
Hi @amariucaitheodor . Thank you for reporting the issue!
Could you also copy-paste the error (traceback) you got to your above PR description? Thanks.
I tried the colab and found the issue. Specifically, the code which is used for calculating input_ids and input_ids_masked is incorrect as the torch_mask_tokens
function returns modified input_ids with masking and the corresponding labels. Since the loss is only calculated on the masked tokens, other tokens are set to -100 in the labels. This causes an "index out of range" error down the line in the embeddings' forward.
Thank you for the reply! I had noticed the same problem.
What is then the correct way of calculating input_ids_masked
? The code doesn't work with DataCollatorForLanguageModeling
for the reasons mentioned above, and there is no other example for doing this.
Thank you @amariucaitheodor for providing the error log, and thanks @apsdehal for sharing your finding. I will take a look on this issue. But @apsdehal , don't hesitate to share if you have any idea regarding the correct solution ❤️
Hello! After looking into the issue with the notebook, here is my finding:
data_collator.torch_mask_tokens(inputs=inputs['input_ids'], ....)
return two items
-100
: it means that places is not maskedinputs
FlavaForPreTraining
model expect input_ids_masked
to be the masked inputs, which is the first item prepared above. See https://github.com/huggingface/transformers/blob/f7329751fe5c43365751951502c00df5a4654359/src/transformers/models/flava/modeling_flava.py#L803-L805inputs['input_ids'], inputs['input_ids_masked'] = data_collator.torch_mask_tokens(...)
which cause inputs['input_ids_masked']
to be the 2nd item return ed by torch_mask_tokens
which is incorrect. In particularly, it contains -100
, which causes the error. Furthermore, inputs['input_ids']
is also the wrong value, but it doesn't cause the program to crash.
The solution is just to prepare the correct inputs for the model:
inputs['input_ids_masked'], _ = data_collator.torch_mask_tokens(
inputs=inputs['input_ids'],
special_tokens_mask=inputs['special_tokens_mask']
)
With this change, I get loss: 7.162976264953613
.
Let me know if you have further question 🤗
@ydshieh I don't think this is also correct as torch_mask_tokens
masks the input_ids
in place so you will have to clone the input_ids
before passing them to it.
@apsdehal Thanks a lot, nice catch! You are 100% correct. @amariucaitheodor Please see this comment too!
As it turns out that this is not an issue in modeling code in transformers
, but the wrong preparation of model inputs, I move forward to close the issue.
@amariucaitheodor If you still have issues, you can post on Hugging Face Forums.
However, if you find other issue(s) you believe that is/are in modeling code, feel free to continue to leave comments here.
System Info
transformers
version: 4.27.0.dev0N.B. I do have PyTorch installed, I'm not sure why the tool can't find it:
Who can help?
@apsdehal
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Steps to reproduce the behavior (also a Colab notebook doing this):
fetch_images
is in the notebook above):Expected behavior
I would expect the forward pass to not throw errors.
Actual behavior