huggingface / optimum-habana

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
Apache License 2.0
150 stars 193 forks source link

Performance for summarization task on BART is low after latest Transformer 4.40 upgrade #1144

Open astachowiczhabana opened 3 months ago

astachowiczhabana commented 3 months ago

System Info

Bad
Optimum Habana latest main: c495f479d9abf04fb7adb6f0a5607d7963186649
Synapse docker image: v1.16

Good:
Optimum Habana one commit before Transformer 4.40 upgrade: 569580ff9bf44083514533ad28e336043891947b
Synapse docker image: v1.16

Information

Tasks

Reproduction

cd /root/optimum-habana/examples/summarization pip install -r requirements.txt PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES=1 python run_summarization.py --model_name_or_path facebook/bart-large-cnn --do_predict --predict_with_generate --dataset_name cnn_dailymail --dataset_config \"3.0.0\" --output_dir ./tst-summarization --overwrite_output_dir --per_device_eval_batch_size 2 --use_habana --use_lazy_mode --use_hpu_graphs_for_inference --gaudi_config_name Habana/t5 --ignore_pad_token_for_loss False --pad_to_max_length --num_beams 1 --generation_num_beams 1 --bf16 --ignore_eos False

Expected behavior

The quickest way to check if something is wrong is observe performance.

Before Transformer 4.40 upgrade the speed is ~3.9 it/s After Transformer 4.40 upgrade the speed is ~1.7 it/s

regisss commented 3 days ago

@astachowiczhabana Are we still having this regression?