Closed CaffreyR closed 1 year ago
Hi @CaffreyR 👋
Can you share a snippet where I can fully reproduce the issue locally? Also -- am I right in saying that the issue is that the exact same input might result in different outputs, using model.generate()
?
Hi @gante , thanks for your kind reply. Sorry but the full code has not been released. It is actually modified from the code of facebook FID. I count the time in the evaluation https://github.com/facebookresearch/FiD/blob/main/test_reader.py#L36
You can add the code in the for
cycle,
for i, batch in enumerate(dataloader):
(idx, _, _, context_ids, context_mask) = batch
torch.cuda.synchronize()
import time
start = time.perf_counter()
if opt.write_crossattention_scores:
model.reset_score_storage()
outputs = model.generate(
input_ids=context_ids.cuda(),
attention_mask=context_mask.cuda(),
max_length=50,
)
if opt.write_crossattention_scores:
crossattention_scores = model.get_crossattention_scores(context_mask.cuda())
for k, o in enumerate(outputs):
ans = tokenizer.decode(o, skip_special_tokens=True)
example = dataset.data[idx[k]]
if 'answers' in example:
score = src.evaluation.ems(ans, example['answers'])
exactmatch.append(score)
if opt.write_results:
fw.write(str(example['id']) + "\t" + ans + '\n')
if opt.write_crossattention_scores:
for j in range(context_ids.size(1)):
example['ctxs'][j]['score'] = crossattention_scores[k, j].item()
total += 1
if (i + 1) % opt.eval_print_freq == 0:
log = f'Process rank:{opt.global_rank}, {i+1} / {len(dataloader)}'
if len(exactmatch) == 0:
log += '| no answer to compute scores'
else:
log += f' | average = {np.mean(exactmatch):.3f}'
logger.warning(log)
torch.cuda.synchronize()
end = time.perf_counter()
print(end-start)
logger.warning(f'Process rank:{opt.global_rank}, total {total} | average = {np.mean(exactmatch):.3f}')
if opt.is_distributed:
torch.distributed.barrier()
score, total = src.util.weighted_average(np.mean(exactmatch), total, opt)
return score, total
And the outputs
, there actually 3 outputs
forward
in some layers.I think the difference in The outputs of model generate
can be fixed by loading the same weight of model, but the second and third are different.
Many thanks again for your time
Best, CaffreyR
@CaffreyR without an exact script, I am limited in what I can do :) I understand your limitations, but the problem you are describing can come from many places.
In essence, generate()
can have variable outputs (which leads to different execution times) for the same input in two circumstances:
generate()
is configured to not be deterministic. If transformers
generate()
is being used without modifications, this should only be possible with the do_sample=True
argument.generate()
calls. Hi @gante , thanks again for your reply. It actually do not modify, see here, it actually just use the generate()
from transformers.T5ForConditionalGeneration
And what is the do_sample=True
? Because in the code here.
https://github.com/facebookresearch/FiD/blob/main/test_reader.py#L115
There is a sampler , does it match the circumstance 1
, could you please explain it more? Thanks
from torch.utils.data import DataLoader, SequentialSampler
eval_examples = src.data.load_data(
opt.eval_data,
global_rank=opt.global_rank, #use the global rank and world size attibutes to split the eval set on multiple gpus
world_size=opt.world_size
)
eval_dataset = src.data.Dataset(
eval_examples,
opt.n_context,
)
eval_sampler = SequentialSampler(eval_dataset)
eval_dataloader = DataLoader(
eval_dataset,
sampler=eval_sampler,
batch_size=opt.per_gpu_batch_size,
num_workers=20,
collate_fn=collator_function
)
It shouldn't be related, SequentialSampler
only touches the data, not the generate()
method.
As for an explanation of do_sample
, you can refer to our docs or our blog post.
Please note that without a full reproduction script I won't give further support here. As per our issues guidelines, we reserve GitHub issues for bugs in the repository (with clear reproducibility) and/or feature requests. For any other matters, we'd like to invite you to use our forum 🤗
HI @CaffreyR ,
Maybe it's because of the t5-base
configuration ? https://huggingface.co/t5-base/blob/main/config.json#L21
These lines modify the default options of generate
for this model.
Hi @Narsil , thanks for your kind reply. Do you mean task_specific_params
? Could you please explain more? What is the default option and how does it modify them ? Thanks!
the pipeline reads task_specific_params
and overrides the default when it's present.
We realized this wasn't super discoverable, so very few models have this feature being used, but I happen to remember this one does.
So if you're using t5-base
as a summarization
pipeline (which I think is the default) then the pipeline will use those defaults and treat them as regular params, it happens these control the generate_kwargs
of generate
.
Sometimes models also have defaults in the config
(same idea just it's for the whole model and does not depend on the actual task).
Neither of these mechanism is really great at showing to users what happens but it's great to try and provide sane defaults (or the ones used in the original repo/ original paper).
If you want to override any you just need to supply yours directly to generate
for instance.
User specified > Config > Default
is the order of resolution (pipeline
has a few more rules, but you're not using them in fact).
Hi @Narsil , thanks for your explanation. So what should I do, the code here just use t5.config to them. I need to delete task_specific_params
in this case?
@CaffreyR
You can:
pipeline(model="t5-base",
**{
"early_stopping": true,
"length_penalty": None,
"max_length": None,
"min_length": None,
"no_repeat_ngram_size": None,
"num_beams": None,
"prefix": "summarize: " # You probably want to keep this for summarization as it's how the model was trained
})
Or deactivate them altogether by loading the model before the pipeline
model =AutoModelForXXX.from_pretrained("t5-base")
model.config.task_specific_params = None
pipe = pipeline(task="summarization", model=model, tokenizer=tokenizer)
Would either solution work for you ?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers
version: 4.20.1Who can help?
@LysandreJik @patrickvonplaten, @Narsil, @gante
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
It is actually modified from FID. In the code, I give model, inherit from
T5ForConditionalGeneration
. And I also count the number of usingforward
some layer, it seems that it shows different number of usingforward
.Expected behavior
In two same batch, it print
And it actuallly goes different number of layer, for example, in the first batch, it goes 288 times, but in the second, it goes 216 times.