intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs
MIT License
128 stars 37 forks source link

[timm] `cait_m36_384` fails to run #523

Closed whitneywhtsang closed 3 months ago

whitneywhtsang commented 7 months ago

cait_m36_384 fails to run in all modes and data types.

ienkovich commented 7 months ago

This test runs out-of-memory. Tracing shows it allocates ~51GB of the device memory, then gets OOM, and then goes into some infinite loop or very slow processing (not finished in 15 hours). This happens for both eager and inductor modes on XPU.

retonym commented 7 months ago

Does the OOM happens in inference or training model? Thx.

ienkovich commented 7 months ago

It happens in all modes and for all datatypes.

vlad-penkin commented 3 months ago

This Issue is no longer reproducible.

Env: