Open chuanqi129 opened 1 month ago
This issue is related to the wrong datatype infer in sdp kernel. Will investigate further.
The issue is related to pytorch main regression. This issue also occurs in cuda, if forcing to use math sdp.
@retonym will create a pytorch issue to track it
PyTorch issue for this crash: https://github.com/pytorch/pytorch/issues/133974
🐛 Describe the bug
According to the latest weekly tests, there are 2 models crash issue in AMP_FP16 and AMP_BF16 training accuracy tests. Refer https://github.com/intel/torch-xpu-ops/actions/runs/10278413083/job/28442046973
Model List:
BartForCausalLM
BartForConditionalGeneration
Failures log:
Versions
Failure On-demand Test on 2024-08-07, See: https://github.com/intel/torch-xpu-ops/actions/runs/10278413083