Closed ESI-SYD closed 9 months ago
With the following setup, all huggingface
models can pass for all 3 data types (amp_bf16, amp_fp16, float32) and 2 models (inference, training):
Stonepia/pytorch dev/triton-test-3.0 0f6d72ce16bd4b30402dcad97144d17cd7bc53ed
intel/intel-xpu-backend-for-triton b6d3678483dbffa58f0470a46c0b512f223aabda
intel-extension-for-pytorch 2.1.10+git99b4297
torch 2.1.0a0+git0f6d72c
transformers 4.27.4
Note:
4.38.0.dev0
) causes some workloads to fail, e.g., BartForCausalLM
.DebertaForQuestionAnswering
, GPT2ForSequenceClassification
, LayoutLMForSequenceClassification
, DistilBertForQuestionAnswering
, RobertaForQuestionAnswering
.With the following setup, all
huggingface
models can pass for all 3 data types (amp_bf16, amp_fp16, float32) and 2 models (inference, training):Stonepia/pytorch dev/triton-test-3.0 0f6d72ce16bd4b30402dcad97144d17cd7bc53ed intel/intel-xpu-backend-for-triton b6d3678483dbffa58f0470a46c0b512f223aabda intel-extension-for-pytorch 2.1.10+git99b4297 torch 2.1.0a0+git0f6d72c transformers 4.27.4
Note:
- the current latest transformers version (
4.38.0.dev0
) causes some workloads to fail, e.g.,BartForCausalLM
.- some workloads fail intermittently, e.g.,
DebertaForQuestionAnswering
,GPT2ForSequenceClassification
,LayoutLMForSequenceClassification
,DistilBertForQuestionAnswering
,RobertaForQuestionAnswering
.
Good news! From a Triton compiler perspective this result kinda indicate the compiler is clean. How often do the intermittent failures happen?
@vlad-penkin @pbchekin I think we are ready to automate the runs for huggingface as a first step toward automating all pytorch benchmarks. Someone in the pytorch team should investigate the issue affecting BartForCausalLM
with the latest transformers
version (4.38.0.dev0
).
@whitneywhtsang as part of this work item can you document the exact procedure required to reproduce this result. Also please note the Triton commit used for this experiment in this issue please.
Reproducer on x1spr cluster:
BASE=$HOME
TRITON_PROJ=$BASE/intel-xpu-backend-for-triton
PYTORCH_PROJ=$BASE/pytorch
conda create --name triton-3.0 python=3.10
conda activate triton-3.0
pip install /data/intel_extension_for_pytorch-2.1.10+git99b4297-cp310-cp310-linux_x86_64.whl
if [ ! -d "$TRITON_PROJ" ]
then
cd $BASE
git clone https://github.com/intel/intel-xpu-backend-for-triton.git -b llvm-target
fi
cd $TRITON_PROJ
scripts/compile-triton.sh
if [ ! -d "$PYTORCH_PROJ" ]
then
cd $BASE
git clone https://github.com/Stonepia/pytorch.git -b dev/triton-test-3.0
fi
cd $PYTORCH_PROJ
pip install pyyaml
make clean
python setup.py install
# Note: install the transformers version listed in https://github.com/Stonepia/pytorch/blob/dev/triton-test-3.0/.ci/docker/ci_commit_pins/huggingface.txt
# Note: the current latest transformers version (4.38.0.dev0) causes some workloads to fail, e.g., BartForCausalLM
pip install "transformers==4.27.4"
pip install pandas
# Note: `ulimit -n 1048576` may not work on some machines
cp $TRITON_PROJ/scripts/inductor_xpu_test.sh .
bash inductor_xpu_test.sh huggingface float32 training accuracy xpu 0
bash inductor_xpu_test.sh huggingface float32 inference accuracy xpu 0
bash inductor_xpu_test.sh huggingface amp_bf16 training accuracy xpu 0
bash inductor_xpu_test.sh huggingface amp_bf16 inference accuracy xpu 0
bash inductor_xpu_test.sh huggingface amp_fp16 training accuracy xpu 0
bash inductor_xpu_test.sh huggingface amp_fp16 inference accuracy xpu 0
# Expected `pip list` output:
# triton 3.0.0 /home/jovyan/intel-xpu-backend-for-triton/python
# intel-extension-for-pytorch 2.1.10+git99b4297
# torch 2.1.0a0+git0f6d72c
# transformers 4.27.4
Also please note the Triton commit used for this experiment in this issue please.
It is written above in the setup.
How often do the intermittent failures happen?
Happened once for the listed workloads for all combinations of run, and pass right away after rerun individually.
Thanks @whitneywhtsang for the answers. I believe we can now close this one.
Training Accuracy check results of
huggingface
models based on triton 3.0.0 (3 test scenarios in total)Test datatype:
amp_bf16
amp_fp16
float32
This issue can be split into multiple work items
Failed model list:
Reproduce: (replace with real dtype and model)
Version:
triton: https://github.com/intel/intel-xpu-backend-for-triton/commit/97ac4f91d149a3392d6e14f5d39aa4953fb6c56e