Open jjfumero opened 2 years ago
Hi @jjfumero
So this is more of implementation specific detail and it depends on how the driver stack works. The implementation you are using here is the Intel L0 driver, and that SW stack uses basically lazy allocation or residency of allocations.
This works as this: you can allocate several allocations, as long as each allocation is up to the maximum allocatable size. Now, the reason you are able to allocate several which in total is larger than the device total memory is because those allocations are made resident in the device only when needed. That is, you could have N allocations, but your workload might need only one at time when executing in the device. So actually, the device memory doesn't need to hold all simultaneously, which is why your allocations succeed.
Now, if you have a kernel that actually needs all those allocations, then when submitting that kernels, the driver would try to make resident all of them, and expectedly, submission would fail, as there's no space to make all of them resident. In this case, zeCommandQueueExecuteCommandLists may return OUT_OF_MEMORY error.
@jandres742 thank you for the clarification. So unless the allocated buffers are required by the kernel being executed, they are not actually allocated. But does this happen for shared buffers, device buffers and host buffers in Level Zero?
I understand the lazy allocation might happen for device buffers, but I don't see why the other types of buffers should be lazily allocated. Also, I get a crash during the buffer allocation as shown in this example:
ze_result_t result;
void *sharedBuffer = nullptr;
hostDesc.pNext = &exceedCapacity;
memAllocDesc.pNext = &exceedCapacity;
std::cout << "Allocating Shared: " << allocSize << " bytes - " << (allocSize * 1e-9 ) << " (GB) " << std::endl;
result = zeMemAllocShared(context, &memAllocDesc, &hostDesc, allocSize, 128, device, &sharedBuffer);
if (result == 0x78000009) {
std::cout << "size argument is not supported by the device \n";
} else if (result == ZE_RESULT_SUCCESS) {
std::cout << "\tAlloc OK" << std::endl;
}
void *deviceBuffer = nullptr;
std::cout << "Allocating On Device: " << allocSize << " bytes - " << (allocSize * 1e-9 ) << " (GB) " << std::endl;
result = zeMemAllocDevice(context, &memAllocDesc, allocSize, 64, device, &deviceBuffer);
if (result == 0x78000009) {
std::cout << "size argument is not supported by the device \n";
} else if (result == ZE_RESULT_SUCCESS) {
std::cout << "\tAlloc OK" << std::endl;
}
void *hostBuffer = nullptr;
std::cout << "Allocating from Host " << allocSize << " bytes - " << (allocSize * 1e-9 ) << " (GB) " << std::endl;
result = zeMemAllocHost(context, &hostDesc, allocSize, 64, &hostBuffer);
if (result == 0x78000009) {
std::cout << "size argument is not supported by the device \n";
} else if (result == ZE_RESULT_SUCCESS) {
std::cout << "\tAlloc OK" << std::endl;
}
In my case, I can allocate global memory buffers of up to 26GB. If I run this and I allocate 20GB per buffer, I get a crash during the allocation using the zeMemAllocHost
function (3rd alloc function).
What I take from here is that,
zeMemAllocDevice
is lazily allocated. zeMemAllocHost
and zeMemAllocShared
are directly allocated (blocking calls) and directly accessible from the host. If this is the case, is it expected to get a crash during the execution of the zeMemAllocHost
function? or should we get an exception or an error code with an alloc failure?
You can reproduce this using this error using this code: https://github.com/jjfumero/codeBlogArticles/blob/master/april2022/levelZeroAlloc/levelZeroAlloc.cpp
So unless the allocated buffers are required by the kernel being executed, they are not actually allocated. But does this happen for shared buffers, device buffers and host buffers in Level Zero
More than not being allocated, it is that they are guarantee to be available by the time the GPU kernel executes. It could be allocated at any time between allocation and kernel execution, there's no exact point. The only guarantee is that they will be ready by the time of execution.
All allocations in L0 driver go through KMD, so all share similar behavior, which is what you might be seeing.
If this is the case, is it expected to get a crash
what crash you get?
what crash you get?
I am not sure what to report. The Linux terminal I run on to execute this program suddenly closes along with subprocesses that I have been running through this terminal. Also dmesg
does not seem to report anything related to the crash. Just the terminal window is suddenly closed. Is there any way to report this type of crash?
@jjfumero are you still seeing the crash?
Hi @jandres742 . Sorry for the delay. I just checked with the latest driver (22.35.24055) on Ubuntu and still I get the crash with no warnings/errors when I allocate more than I should. I am not sure if this is the expected behaviour, meaning that, should the develop controls the remaining memory space? or the Level Zero implementation can control this and throw an exception?
To reproduce it, I am still using the program I sent https://github.com/jjfumero/codeBlogArticles/blob/master/april2022/levelZeroAlloc/levelZeroAlloc.cpp
./levelZeroAlloc 200000000000
The problem about this test in my case it that all applications that are running/using the iGPU are suddenly stop and closed.
@jjfumero could you confirm whether you are seeing the issue with latest drivers?
Hi @jandres742 , I am not using the latest drivers. I will update, and let you know.
@jandres742 , I confirm this issue is gone with the latest driver : https://github.com/intel/compute-runtime/releases/tag/22.53.25242.13
To reproduce it: https://github.com/jjfumero/codeBlogArticles/blob/master/april2022/levelZeroAlloc/levelZeroAlloc.cpp
> ./levelZeroAlloc 30000000000
Device : Intel(R) UHD Graphics 770 [0x4680]
Type : GPU
Vendor ID: 8086
#Queue Groups: 1
Allocating Shared Memory: 30000000000 bytes - 30 (GB)
size argument is not supported by the device
Allocating Device Memory: 30000000000 bytes - 30 (GB)
size argument is not supported by the device
Allocating Host Memory: 30000000000 bytes - 30 (GB)
Alloc OK
std::cout << "Allocating Shared Memory: " << allocSize << " bytes - " << (allocSize * 1e-9 ) << " (GB) " << std::endl;
result = zeMemAllocShared(context, &memAllocDesc, &hostDesc, allocSize, 128, device, &sharedBuffer);
if (result == 0x78000009) {
std::cout << "size argument is not supported by the device \n";
} else if (result == ZE_RESULT_SUCCESS) {
std::cout << "\tAlloc OK" << std::endl;
}
Thanks
When playing around with the
ZE_RELAXED_ALLOCATION_LIMITS_EXP_FLAG_MAX_SIZE
flag for buffer allocation I noticed the following:If I request a buffer to be allocated with a size larger than my system allows (in my case 26GB), I get an error with
0x78000009
(size argument is not supported by the device ). Which is expected.For context, this is the output of the device memory properties of my system:
However, I am able to execute the allocate functions (e.g.,
zeMemAllocDevice
) with for example, 3 buffers of 20 GB each (in total is using 60GB in global memory, which I should not be allowed to do this), So, each alloc call is requesting a buffer size smaller than the maximum global memory available but combined, it is much larger. But instead of getting an error code, I get directly a crash.You can reproduce this using this sample code: https://github.com/jjfumero/codeBlogArticles/blob/master/april2022/levelZeroAlloc/levelZeroAlloc.cpp
Is this behaviour expected? Or have you considered/ is there anything in the Level Zero API similar to this call?
So a function that we can query for available space for a given buffer before the actual allocation.
Hardware/ Software details: