intel / torch-xpu-ops

Apache License 2.0
23 stars 15 forks source link

E2E test XPU out of memory #701

Open mengfei25 opened 1 month ago

mengfei25 commented 1 month ago

šŸ› Describe the bug

Out of memory in weekly test, https://github.com/intel/torch-xpu-ops/actions/runs/10218591763

Model list:

RuntimeError: XPU out of memory Suite Dtype Mode Scenario Model
huggingface amp_bf16 inference accuracy GPTJForCausalLM
huggingface amp_bf16 inference accuracy GPTJForQuestionAnswering
huggingface amp_bf16 inference performance GPTJForQuestionAnswering
huggingface amp_bf16 training accuracy GPTJForCausalLM
huggingface amp_bf16 training accuracy GPTJForQuestionAnswering
huggingface amp_bf16 training performance GPTJForQuestionAnswering
huggingface amp_fp16 inference accuracy GPTJForQuestionAnswering
huggingface amp_fp16 inference accuracy GPTJForCausalLM
huggingface amp_fp16 inference performance GPTJForQuestionAnswering
huggingface amp_fp16 training accuracy GPTJForCausalLM
huggingface amp_fp16 training accuracy GPTJForQuestionAnswering
huggingface amp_fp16 training performance GPTJForQuestionAnswering
huggingface bfloat16 training accuracy GPTJForCausalLM
huggingface bfloat16 training accuracy GPTJForQuestionAnswering
huggingface float16 training accuracy GPTJForCausalLM
huggingface float16 training accuracy GPTJForQuestionAnswering
torchbench amp_bf16 inference accuracy hf_T5_base
torchbench amp_bf16 inference accuracy stable_diffusion_unet
torchbench amp_bf16 inference accuracy llava
torchbench amp_bf16 inference performance llava
torchbench amp_bf16 inference performance hf_distil_whisper
torchbench amp_bf16 inference performance stable_diffusion_unet
torchbench amp_bf16 training accuracy stable_diffusion_unet
torchbench amp_bf16 training accuracy llava
torchbench amp_bf16 training performance stable_diffusion_unet
torchbench amp_fp16 inference accuracy stable_diffusion_unet
torchbench amp_fp16 inference accuracy hf_T5_base
torchbench amp_fp16 inference accuracy llava
torchbench amp_fp16 inference performance hf_distil_whisper
torchbench amp_fp16 inference performance stable_diffusion_unet
torchbench amp_fp16 inference performance llava
torchbench amp_fp16 training accuracy stable_diffusion_unet
torchbench amp_fp16 training accuracy llava
torchbench amp_fp16 training performance stable_diffusion_unet
torchbench bfloat16 inference accuracy hf_T5_base
torchbench bfloat16 inference accuracy llava
torchbench bfloat16 inference performance llava
torchbench bfloat16 training accuracy llava
torchbench bfloat16 training accuracy stable_diffusion_unet
torchbench bfloat16 training performance stable_diffusion_unet
torchbench float16 inference accuracy llava
torchbench float16 inference accuracy hf_T5_base
torchbench float16 inference performance llava
torchbench float16 training accuracy stable_diffusion_unet
torchbench float16 training accuracy llava
torchbench float16 training performance stable_diffusion_unet
torchbench float32 inference accuracy stable_diffusion_unet
torchbench float32 inference accuracy llava
torchbench float32 inference performance hf_distil_whisper
torchbench float32 inference performance stable_diffusion_unet
torchbench float32 inference performance llava
torchbench float32 training accuracy stable_diffusion_unet
torchbench float32 training accuracy llava
torchbench float32 training performance stable_diffusion_unet
RuntimeError: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES) Suite Dtype Mode Scenario Model
huggingface amp_bf16 inference performance GPTJForCausalLM
huggingface amp_bf16 training accuracy BlenderbotForConditionalGeneration
huggingface amp_bf16 training performance GPTJForCausalLM
huggingface amp_fp16 inference performance GPTJForCausalLM
huggingface amp_fp16 training accuracy BlenderbotForConditionalGeneration
huggingface amp_fp16 training performance GPTJForCausalLM
huggingface float32 training accuracy BlenderbotForConditionalGeneration
huggingface float32 training accuracy GPTJForCausalLM
huggingface float32 training accuracy GPTJForQuestionAnswering
huggingface float32 training performance GPTJForCausalLM
huggingface float32 training performance GPTJForQuestionAnswering
torchbench bfloat16 inference performance hf_distil_whisper
torchbench float16 inference performance hf_distil_whisper

Versions

torch-xpu-ops: https://github.com/intel/torch-xpu-ops/commit/1d70431c072db889d9a47ea4956049fe340a426d pytorch: d224857b3af5c9d5a3c7a48401475c09d90db296 device: pvc 1100, bundle: 0.5.3, driver: 803.61

mengfei25 commented 1 month ago

Looks like hf_distil_whisper is regression ./torchbench/amp_bf16/inductor_torchbench_amp_bf16_inference_xpu_performance_all.log:xpu eval hf_distil_whisper running benchmark: 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 10/10 [00:00<00:00, 10.61it/s]erformance_all.log- ./torchbench/amp_bf16/inductor_torchbench_amp_bf16_inference_xpu_performance_all.log-1.656x pytorch: https://github.com/pytorch/pytorch/commit/dadc0ed torch-xpu-ops: https://github.com/intel/torch-xpu-ops/commit/45e55a3

chuanqi129 commented 3 weeks ago

Looks like hf_distil_whisper is regression ./torchbench/amp_bf16/inductor_torchbench_amp_bf16_inference_xpu_performance_all.log:xpu eval hf_distil_whisper running benchmark: 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 10/10 [00:00<00:00, 10.61it/s]erformance_all.log- ./torchbench/amp_bf16/inductor_torchbench_amp_bf16_inference_xpu_performance_all.log-1.656x pytorch: pytorch/pytorch@dadc0ed torch-xpu-ops: 45e55a3

Hi @retonym, this is a regression issue, can we double check it?

weishi-deng commented 3 weeks ago

Recollect the model test on pytorch/pytorch@dadc0ed torch-xpu-ops: 45e55a3 on my local pvc 1100, this issue exists. Besides, this model also fails with out-of-memory on the CUDA backend.