Different results obtained using pipeline (worse) vs. model.generate under the same decoding strategy

4.44 Linux 3.12

@Rocketknight1 @gante

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Train a small model like T5 on a small synthetic data for summarization
Evaluate the model on the test set
Results match validation accuracy from train logs when using model.generate with default decoding_strategy = greedy search when setting do_sample=False, num_beams=1
Try to replicate the same using the pipeline but results are completely off $\pm 50\%$
Double checked all parameters to be set to default values as in model.generate still the same issue.
Iterated through all possible decoding_strategies and still no sign of correction when using pipeline

Expected the behaviour of pipeline to be the same as that of model.generate when using the same underlying decoding_strategy

huggingface / transformers