huggingface / optimum-habana

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
Apache License 2.0
132 stars 152 forks source link

Performance is better in 1.6.1 release compared to 1.7.4 release in many models #419

Open vineethanandh opened 10 months ago

vineethanandh commented 10 months ago

System Info

Optimum-habana - 1.7.4
Synapse AI - 1.12.0
Docker - 1.12.0-463
Gaudi2 (HLS 225) - 1x and 8x.

Information

Tasks

Reproduction

Steps to Reproduce Writing down the steps to reproduce to run SwinT in 1x

  1. Download and install optimum-habana
  2. git clone https://github.com/huggingface/optimum-habana.git
  3. cd optimum-habana
  4. git chekout v1.6-release
  5. pip install -r examples/image-classification/requirements.txt
  6. pip install optimum-habana==1.6.1
  7. python3 /root//optimum-habana/examples/image-classification/run_image_classification.py --model_name_or_path microsoft/swin-base-patch4-window7-224 --dataset_name cifar10 --output_dir /tmp/swint_hf/results/ --remove_unused_columns False --do_train --learning_rate 2e-05 --per_device_train_batch_size 64 --evaluation_strategy no --save_strategy no --load_best_model_at_end True --save_total_limit 3 --seed 1337 --use_habana --use_lazy_mode --gaudi_config_name Habana/swin --throughput_warmup_steps 3 --ignore_mismatched_sizes --bf16 --num_train_epochs 1 --logging_steps 20 --dataloader_num_workers 8

Expected behavior

The expected behaviour is that 1.12.0-463 having similar perf with optimum-habana 1.6.1 and optimum-habana 1.7.4

But what is observed is that perf is better in optimum-habana 1.6.1 and comparitively lesser in 1.7.4

This is applicable for SwinT, ViT, Bert-Large in 8x and 1x. Eg: values in SwinT is given below

OH - 1.7.4 values 362.524 362.566 360.719 358.089

OH - 1.6.1 values 389.045 390.971 389.587

Almost 7.5% drop

regisss commented 10 months ago

I'm going to look into it

vineethanandh commented 9 months ago

@regisss - Did you get some time to check this behaviour

regisss commented 9 months ago

@regisss - Did you get some time to check this behaviour

Not yet. I don't think I'll have time to do it this week and before releasing Optimum Habana v1.8. I'll investigate this by next week and will release a patch if needed.