Closed CaffreyR closed 1 year ago
Hi @CaffreyR 👋 At a first glance at our code base, I don't see how that bug can arise 🤔 Can you share a script or a notebook where the issue can be reproduced?
Hi @gante, yes of course! Many thanks! The code is here https://github.com/CaffreyR/FiD with little revision from https://github.com/facebookresearch/FiD. We can see our problem is here https://github.com/CaffreyR/FiD/blob/main/train_reader.py#L63.
The transformer version of this code is different from my experiment.(This is the script that is the easiest for you to produce). Please follow the steps on readme
on https://github.com/facebookresearch/FiD#download-data to prepare the data(a bit large). And try to run
python train_reader.py \
--use_checkpoint \
--train_data open_domain_data/NQ/train.json \ # after we preparing the data
--eval_data open_domain_data/NQ/dev.json\ # after we preparing the data
--model_size base \
--per_gpu_batch_size 1 \
--n_context 100 \
--name my_experiment \
--checkpoint_dir checkpoint \
This data set is NaturalQuestions
, it is little tricky to get the data prepared. So I am very grateful for your help!:)
Thank you very much!
Hey @CaffreyR -- with a long script it's hard to pinpoint the issue :) We need a short reproducible script, otherwise we will not prioritize this issue.
Hi @gante , it is very interesting that I try to use this code and it runs successfully. The batch is the same from FID, only the model is different. The original facebook code inherited and nested the t5 model.
import torch
import transformers
model = transformers.T5ForConditionalGeneration.from_pretrained('t5-base')
# model = src.model.FiDT5(t5.config)
# model.load_t5(t5.state_dict())
context_ids=torch.tensor([[[ 822, 10, 3, 9, 538, 213, 1442, 9481, 1936, 10687,
999, 2233, 10, 1862, 12197, 16, 1547, 2625, 10, 1862,
12197, 16, 1547, 37, 1862, 12197, 16, 1547, 2401, 7,
12, 3, 9, 1059, 116, 2557, 11402, 47, 12069, 139,
46, 2913, 358, 788, 12, 8, 9284, 13, 941, 2254,
11, 748, 224, 38, 8, 169, 13, 306, 6339, 53,
1196, 41, 15761, 553, 61, 7299, 6, 3, 29676, 6,
21455, 2465, 6, 6256, 9440, 7, 6, 11, 20617, 277,
5, 100, 47, 294, 13, 8, 2186, 1862, 9481, 14310,
16781, 57, 13615, 7254, 40, 402, 122, 6, 84, 11531,
26, 10687, 585, 11, 748, 12, 993, 10687, 7596, 16,
8, 2421, 296, 5, 37, 1862, 12197, 441, 1547, 3,
28916, 16, 8, 778, 8754, 7, 24, 2237, 12, 46,
993, 16, 542, 8273, 999, 6, 902, 16, 27864, 6,
3504, 21247, 6, 11, 31251, 22660, 5, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]])
labels=torch.tensor([[1547, 1]])
context_mask=torch.tensor([[[ True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, False, False,
False, False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, False]]])
# print(context_ids)
# print(labels)
# print(context_mask)
n_layers, n_heads = 12, 12
head_importance = torch.zeros(n_layers, n_heads).to('cpu')
attn_entropy = torch.zeros(n_layers, n_heads).to('cpu')
head_mask = torch.ones(n_layers, n_heads).to('cpu')
head_mask.requires_grad_(requires_grad=True)
decoder_head_mask = torch.ones(n_layers, n_heads).to('cpu')
decoder_head_mask.requires_grad_(requires_grad=True)
if context_ids != None:
# inputs might have already be resized in the generate method
# if context_ids.dim() == 3:
# self.encoder.n_passages = context_ids.size(1)
context_ids = context_ids.view(context_ids.size(0), -1)
if context_mask != None:
context_mask = context_mask.view(context_mask.size(0), -1)
outputs = model.forward(
input_ids=context_ids,
attention_mask=context_mask,
labels=labels,
return_dict=True,
head_mask=head_mask,
decoder_head_mask=decoder_head_mask
)
# outputs = model(
# input_ids=context_ids.cuda(),
# attention_mask=context_mask.cuda(),
# labels=labels.cuda(),
# return_dict=True,
# head_mask=head_mask.cuda(),
# decoder_head_mask=decoder_head_mask.cuda()
# )
print(outputs)
It might be the problem of inheriting, I don't know, it just different when I try to simplify the code. :(
def forward(self, input_ids=None, attention_mask=None, **kwargs):
if input_ids != None:
# inputs might have already be resized in the generate method
if input_ids.dim() == 3:
self.encoder.n_passages = input_ids.size(1)
input_ids = input_ids.view(input_ids.size(0), -1)
if attention_mask != None:
attention_mask = attention_mask.view(attention_mask.size(0), -1)
return super().forward(
input_ids=input_ids,
attention_mask=attention_mask,
**kwargs
)
@CaffreyR then it's almost surely an upstream problem -- I noticed it uses transformers==3.0.2
, which may explain the issue you're seeing :)
While I can't provide support in these situations (the problem is not present in transformers
), my advice would be to open an issue in FID and/or to try to monkey-patch their problematic model code.
OK then, I will give it a try ! Thanks!!!
System Info
transformers
version: 4.22.1Who can help?
@patrickvonplaten Many thanks!
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I am using the code from facebook research FID, and I try to use this code
And it report error!
So I went to this line to see the output of t5 encoder output https://github.com/huggingface/transformers/blob/v4.23.1/src/transformers/models/t5/modeling_t5.py#L1609
So I use this code
Expected behavior
It print
<class 'tuple'> @@@ True
, so I setreturn_dict==True
but return a turple