Closed whitneywhtsang closed 3 months ago
This test runs out-of-memory. Tracing shows it allocates ~51GB of the device memory, then gets OOM, and then goes into some infinite loop or very slow processing (not finished in 15 hours). This happens for both eager and inductor modes on XPU.
Does the OOM happens in inference or training model? Thx.
It happens in all modes and for all datatypes.
This Issue is no longer reproducible.
Env:
9a8ab778d34bd24c5caceb340837483decc4c311
fe93a00ffe438e9ba8c8392c0b051b1662c810de
d54ca9f80ead108c8797441681e219becaf963d8
1980f8af5bcd0bb2ce51965cf79d8d4c25dad8a0
10239873229e527f8b7e7b3340c40ee38bb1cfc4
cait_m36_384
fails to run in all modes and data types.