T5ForConditionalGeneration output differently with the same batch input

CaffreyR commented 2 years ago

System Info

transformers version: 4.20.1
Platform: macOS-12.4-arm64-arm-64bit
Python version: 3.9.10
Huggingface_hub version: 0.8.1
PyTorch version (GPU?): 1.13.0.dev20220709 (False)
Tensorflow version (GPU?): 2.8.0 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@LysandreJik @patrickvonplaten, @Narsil, @gante

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

It is actually modified from FID. In the code, I give model, inherit from T5ForConditionalGeneration . And I also count the number of using forward some layer, it seems that it shows different number of using forward.

class FiDT5(transformers.T5ForConditionalGeneration):
    def __init__(self, config):
        ...
    def generate(self, input_ids, attention_mask, max_length):
        self.encoder.n_passages = input_ids.size(1)
        return super().generate(
            input_ids=input_ids.view(input_ids.size(0), -1),
            attention_mask=attention_mask.view(attention_mask.size(0), -1),
            max_length=max_length
        )

t5 = transformers.T5ForConditionalGeneration.from_pretrained('t5-base')
model = FiDT5(t5.config)
model.load_t5(t5.state_dict())

for i, batch in enumerate(dataloader):
    (idx, _, _, context_ids, context_mask) = batch
    outputs = model.generate(
        input_ids=context_ids.cuda(),
        attention_mask=context_mask.cuda(),
        max_length=50,
    )
    print(outputs)

Expected behavior

In two same batch, it print

tensor([[    0, 22789,     9,  3038, 16924,  2060,     1]], device='cuda:0')
tensor([[    0, 17724,  5500,  7059,     1]], device='cuda:0')

And it actuallly goes different number of layer, for example, in the first batch, it goes 288 times, but in the second, it goes 216 times.

gante commented 2 years ago

Hi @CaffreyR 👋

Can you share a snippet where I can fully reproduce the issue locally? Also -- am I right in saying that the issue is that the exact same input might result in different outputs, using model.generate()?

CaffreyR commented 2 years ago

Hi @gante , thanks for your kind reply. Sorry but the full code has not been released. It is actually modified from the code of facebook FID. I count the time in the evaluation https://github.com/facebookresearch/FiD/blob/main/test_reader.py#L36

You can add the code in the for cycle,

for i, batch in enumerate(dataloader):
        (idx, _, _, context_ids, context_mask) = batch
        torch.cuda.synchronize()
        import time
        start = time.perf_counter()
        if opt.write_crossattention_scores:
            model.reset_score_storage()

        outputs = model.generate(
            input_ids=context_ids.cuda(),
            attention_mask=context_mask.cuda(),
            max_length=50,
        )

        if opt.write_crossattention_scores:
            crossattention_scores = model.get_crossattention_scores(context_mask.cuda())

        for k, o in enumerate(outputs):
            ans = tokenizer.decode(o, skip_special_tokens=True)
            example = dataset.data[idx[k]]
            if 'answers' in example:
                score = src.evaluation.ems(ans, example['answers'])
                exactmatch.append(score)

            if opt.write_results:
                fw.write(str(example['id']) + "\t" + ans + '\n')
            if opt.write_crossattention_scores:
                for j in range(context_ids.size(1)):
                    example['ctxs'][j]['score'] = crossattention_scores[k, j].item()

            total += 1
        if (i + 1) % opt.eval_print_freq == 0:
            log = f'Process rank:{opt.global_rank}, {i+1} / {len(dataloader)}'
            if len(exactmatch) == 0:
                log += '| no answer to compute scores'
            else:
                log += f' | average = {np.mean(exactmatch):.3f}'
            logger.warning(log)
        torch.cuda.synchronize()
        end = time.perf_counter()
        print(end-start)

logger.warning(f'Process rank:{opt.global_rank}, total {total} | average = {np.mean(exactmatch):.3f}')
if opt.is_distributed:
    torch.distributed.barrier()
score, total = src.util.weighted_average(np.mean(exactmatch), total, opt)

return score, total

And the outputs , there actually 3 outputs

The outputs of model generate
The number of forward in some layers.
The time, (Some are 1.5 times the size of another)

I think the difference in The outputs of model generate can be fixed by loading the same weight of model, but the second and third are different.

Many thanks again for your time

Best, CaffreyR

gante commented 2 years ago

@CaffreyR without an exact script, I am limited in what I can do :) I understand your limitations, but the problem you are describing can come from many places.

In essence, generate() can have variable outputs (which leads to different execution times) for the same input in two circumstances:

generate() is configured to not be deterministic. If transformers generate() is being used without modifications, this should only be possible with the do_sample=True argument.
the model is not the same between generate() calls.

CaffreyR commented 2 years ago

Hi @gante , thanks again for your reply. It actually do not modify, see here, it actually just use the generate() from transformers.T5ForConditionalGeneration

And what is the do_sample=True ? Because in the code here. https://github.com/facebookresearch/FiD/blob/main/test_reader.py#L115

There is a sampler , does it match the circumstance 1, could you please explain it more? Thanks

from torch.utils.data import DataLoader, SequentialSampler
eval_examples = src.data.load_data(
        opt.eval_data, 
        global_rank=opt.global_rank, #use the global rank and world size attibutes to split the eval set on multiple gpus
        world_size=opt.world_size
    )
    eval_dataset = src.data.Dataset(
        eval_examples, 
        opt.n_context, 
    )

    eval_sampler = SequentialSampler(eval_dataset) 
    eval_dataloader = DataLoader(
        eval_dataset, 
        sampler=eval_sampler, 
        batch_size=opt.per_gpu_batch_size,
        num_workers=20, 
        collate_fn=collator_function
    )

gante commented 2 years ago

It shouldn't be related, SequentialSampler only touches the data, not the generate() method.

As for an explanation of do_sample, you can refer to our docs or our blog post.

Please note that without a full reproduction script I won't give further support here. As per our issues guidelines, we reserve GitHub issues for bugs in the repository (with clear reproducibility) and/or feature requests. For any other matters, we'd like to invite you to use our forum 🤗

Narsil commented 2 years ago

HI @CaffreyR ,

Maybe it's because of the t5-base configuration ? https://huggingface.co/t5-base/blob/main/config.json#L21 These lines modify the default options of generate for this model.

CaffreyR commented 2 years ago

Hi @Narsil , thanks for your kind reply. Do you mean task_specific_params? Could you please explain more? What is the default option and how does it modify them ? Thanks!

Narsil commented 2 years ago

the pipeline reads task_specific_params and overrides the default when it's present.

We realized this wasn't super discoverable, so very few models have this feature being used, but I happen to remember this one does.

So if you're using t5-base as a summarization pipeline (which I think is the default) then the pipeline will use those defaults and treat them as regular params, it happens these control the generate_kwargs of generate. Sometimes models also have defaults in the config (same idea just it's for the whole model and does not depend on the actual task). Neither of these mechanism is really great at showing to users what happens but it's great to try and provide sane defaults (or the ones used in the original repo/ original paper).

If you want to override any you just need to supply yours directly to generate for instance.

User specified > Config > Default is the order of resolution (pipeline has a few more rules, but you're not using them in fact).

CaffreyR commented 2 years ago

Hi @Narsil , thanks for your explanation. So what should I do, the code here just use t5.config to them. I need to delete task_specific_params in this case?

Narsil commented 2 years ago

@CaffreyR

You can:

Override the params directly in the pipeline

pipeline(model="t5-base", 
**{
  "early_stopping": true,
      "length_penalty": None,
      "max_length": None,
      "min_length": None,
      "no_repeat_ngram_size": None,
      "num_beams": None,
      "prefix": "summarize: "  # You probably want to keep this for summarization as it's how the model was trained
      })

Or deactivate them altogether by loading the model before the pipeline

 model =AutoModelForXXX.from_pretrained("t5-base")
 model.config.task_specific_params = None

 pipe = pipeline(task="summarization", model=model, tokenizer=tokenizer)

Would either solution work for you ?

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers