iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.84k stars 611 forks source link

HAL allocator infers/declares wrong usage type when loading large flatbuffers on RDNA3 devices. #13516

Open monorimet opened 1 year ago

monorimet commented 1 year ago

What happened?

With a device whose VRAM and memory heaps should be plenty sufficient for this allocation:

RuntimeError: Error registering modules: C:\E\SHARK-Runtime\runtime\src\iree\hal\drivers\vulkan\vma_allocator.cc:693: RESOURCE_EXHAUSTED; VK_ERROR_OUT_OF_DEVICE_MEMORY; vmaCreateBuffer; failed to allocate buffer of length 26953666560; while invoking native function hal.allocator.allocate.initialized; while calling import;
[ 1]   native hal.allocator.allocate.initialized:0 -
[ 0] bytecode module@1:2938 -

The resource allocation issue seems to be happening because the HAL allocator gets the wrong memory type (DEVICE_VISIBLE over DEVICE_LOCAL) when trying to set up the buffer allocation.

Here I show that the 26GB buffer can allocate successfully if the DEVICE_LOCAL memory type is explicitly specified:

(Pdb) config.device.allocator.allocate_buffer(memory_type=ireert.MemoryType.DEVICE_VISIBLE,allowed_usage=ireert.BufferUsage.DEFAULT,allocation_size=26953666560)
*** RuntimeError: could not allocate buffer: D:\a\SHARK-Runtime\SHARK-Runtime\c\runtime\src\iree\hal\drivers\vulkan\vma_allocator.cc:693: RESOURCE_EXHAUSTED; VK_ERROR_OUT_OF_DEVICE_MEMORY; vmaCreateBuffer

(Pdb) config.device.allocator.allocate_buffer(memory_type=ireert.MemoryType.DEVICE_LOCAL,allowed_usage=ireert.BufferUsage.DEFAULT,allocation_size=26953666560)
<HalBuffer 26953666560 bytes (at offset 0 into 26953666560), memory_type=DEVICE_LOCAL, allowed_access=ALL, allowed_usage=TRANSFER|DISPATCH_STORAGE>

I put a few return_on_error prints in iree/hal/drivers/vulkan/vma_allocator.cc to see which memory type is specified for the failing allocation and it is always DEVICE_VISIBLE for w7900 in the stack below ireert.VmModule.from_flatbuffer() where we load the .vmfb into device memory in SHARK.

Can this be explicitly changed on the python level or will it require refinement of the vma_allocator memory type populating? Can we specify the memory type or have it inferred from a compile-time option? I tried getting the right case to happen in StreamToHAL by compiling the .vmfb with --iree-execution-model=async-external but didn't see an effect on the inferred memory type.

Steps to reproduce your issue

No response

What component(s) does this issue relate to?

Runtime

Version information

5b89a14

Additional context

No response

monorimet commented 1 year ago

Attaching the vulkaninfo for this machine (I left out the report for the RTX 4090 so it's less confusing) vulkaninfo.txt

benvanik commented 1 year ago

This is not the HAL allocator using the wrong memory type but the buffer having the wrong declared usage. For what purpose are you allocating a 26GB buffer?

monorimet commented 1 year ago

This is not the HAL allocator using the wrong memory type but the buffer having the wrong declared usage. For what purpose are you allocating a 26GB buffer?

Dispatch storage iiuc. This is loading in dispatches for the Vicuña model.

benvanik commented 1 year ago

The only buffers we should be allocating host local/device visible are staging buffers and the only ones host visible/device local are external buffers (results from invocations, today) - all others (constants, variables, and transient memory) should be device local only. You can compile to the stream dialect and see if there's anything that stands out with your resource (iree-compile --compile-to=stream). If you can share a dump of that here I can see if I can spot anything obvious.