get_max_memory() returns allocated memory for XPU instead of total device memory

dvrogozh commented 1 month ago

Here: https://github.com/huggingface/accelerate/blob/12a007d55937345aa986f5d7b1a1b6f2038465a7/src/accelerate/utils/modeling.py#L843

XPU is queried for the max allocated memory while other devices, for example cuda, is queried for total free memory: https://github.com/huggingface/accelerate/blob/12a007d55937345aa986f5d7b1a1b6f2038465a7/src/accelerate/utils/modeling.py#L819

This seems a bug. However, I believe that mem_get_info() is not currently supported by XPU backend in pytorch (as of https://github.com/pytorch/pytorch/commit/3477ee38e4dd1429ecfd7e6f20a30cce0f4f78e7) and needs to be requested.

I would also like to note that https://github.com/pytorch/pytorch/pull/129919 will provide implementation for torch.xpu.max_memory_allocated(). For me on an idle device it returned 512 bytes which caused an issue running HF models with pipeline(device_map="auto") - model was dispatched to CPU instead of XPU with this printout (see https://github.com/huggingface/transformers/issues/31922 for details):

/home/gta/git/huggingface/accelerate/src/accelerate/utils/modeling.py:1399: UserWarning: Current model requires 4096 bytes of buffer for offloaded layers, which seems does not fit any GPU's remaining memory. If you are experiencing a OOM later, please consider using offload_buffers=True.

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 @sywangyi @yao-matrix CC: @muellerzr @SunMarc

dvrogozh commented 1 month ago

However, I believe that mem_get_info() is not currently supported by XPU backend in pytorch and needs to be requested.

Filed request in https://github.com/pytorch/pytorch/issues/130599

SunMarc commented 1 month ago

Indeed, thanks for the report ! Keep us updated when this fixed @dvrogozh ! cc @abhilash1910

abhilash1910 commented 1 month ago

Thanks @SunMarc for the ping. I believe that in the existence of XPU, it should trigger 0th device memory params , but I think that it maybe due to this commit (this was seen before) : https://github.com/huggingface/accelerate/commit/30cb7ece76e7cada7aa38f6d3f51947847ae5a76 @faaany could you take a look on this? I agree with @dvrogozh that mem_get_info() api is needed.

faaany commented 3 weeks ago

Hi @abhilash1910 , the issue mentioned by @dvrogozh is a known issue. And it is not related to my commit.

huggingface / accelerate

get_max_memory() returns allocated memory for XPU instead of total device memory #2929