Performance is better in 1.6.1 release compared to 1.7.4 release in many models

vineethanandh commented 10 months ago

System Info

Optimum-habana - 1.7.4
Synapse AI - 1.12.0
Docker - 1.12.0-463
Gaudi2 (HLS 225) - 1x and 8x.

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Steps to Reproduce Writing down the steps to reproduce to run SwinT in 1x

Download and install optimum-habana
git clone https://github.com/huggingface/optimum-habana.git
cd optimum-habana
git chekout v1.6-release
pip install -r examples/image-classification/requirements.txt
pip install optimum-habana==1.6.1
python3 /root//optimum-habana/examples/image-classification/run_image_classification.py --model_name_or_path microsoft/swin-base-patch4-window7-224 --dataset_name cifar10 --output_dir /tmp/swint_hf/results/ --remove_unused_columns False --do_train --learning_rate 2e-05 --per_device_train_batch_size 64 --evaluation_strategy no --save_strategy no --load_best_model_at_end True --save_total_limit 3 --seed 1337 --use_habana --use_lazy_mode --gaudi_config_name Habana/swin --throughput_warmup_steps 3 --ignore_mismatched_sizes --bf16 --num_train_epochs 1 --logging_steps 20 --dataloader_num_workers 8

Expected behavior

The expected behaviour is that 1.12.0-463 having similar perf with optimum-habana 1.6.1 and optimum-habana 1.7.4

But what is observed is that perf is better in optimum-habana 1.6.1 and comparitively lesser in 1.7.4

This is applicable for SwinT, ViT, Bert-Large in 8x and 1x. Eg: values in SwinT is given below

OH - 1.7.4 values 362.524 362.566 360.719 358.089

OH - 1.6.1 values 389.045 390.971 389.587

Almost 7.5% drop

regisss commented 10 months ago

I'm going to look into it

vineethanandh commented 9 months ago

@regisss - Did you get some time to check this behaviour

regisss commented 9 months ago

@regisss - Did you get some time to check this behaviour

Not yet. I don't think I'll have time to do it this week and before releasing Optimum Habana v1.8. I'll investigate this by next week and will release a patch if needed.

huggingface / optimum-habana