oneapi-src / unified-runtime

https://oneapi-src.github.io/unified-runtime/
Other
33 stars 113 forks source link

Add Missing `ur_usm_mem_flags_t` Enumerations #116

Closed FranklandJack closed 1 year ago

FranklandJack commented 1 year ago

Issue

PI has the pi_usm_mem_properties bitfield which can take the following values:

Unified Runtime has the ur_usm_mem_flags_t for which there is no equivalent enumeration entries.

Task

Add the missing enumeration entries to ur_usm_mem_flags_t to meet the requirements for implementing the DPC++ runtime on Unified Runtime.

FranklandJack commented 1 year ago

PI_MEM_USM_ALLOC_BUFFER_LOCATION doesn't actually have any usage in adapters whilst PI_MEM_ALLOC_DEVICE_READ_ONLY is only handled in the L0 adapter, so there may be an argument here that these enumerations aren't required after all if these uses can be removed from the DPC++ runtime.

igchor commented 1 year ago

There are actually more properties defined in PI: https://github.com/intel/llvm/blob/02bc9323e860f5672386db712071fa3277d493b0/sycl/include/sycl/detail/pi.h#L597-L606

Also, similar properties are defined in Level Zero: https://github.com/oneapi-src/level-zero/blob/4ed13f327d3389285592edcf7598ec3cb2bc712e/include/ze_api.h#LL4253C42-L4253C42

I think we should probably expose all of the following in UR:

// from L0
UR_MEM_ALLOC_FLAG_BIAS_CACHED,
UR_MEM_ALLOC_FLAG_BIAS_UNCACHED

// from PI/L0
UR_MEM_ALLOC_WRITE_COMBINED,
UR_MEM_ALLOC_INITIAL_PLACEMENT_DEVICE,
UR_MEM_ALLOC_INITIAL_PLACEMENT_HOST,
UR_MEM_ALLOC_DEVICE_READ_ONLY,
UR_MEM_USM_ALLOC_BUFFER_LOCATION // ?

L0 actually has separate flags for device and host allocations: ze_device_mem_alloc_flag_t and ze_host_mem_alloc_flag_t - I'm not sure if we want to do the same.

The last property I listed (UR_MEM_USM_ALLOC_BUFFER_LOCATION) seems to be something similar to device heaps we discussed in https://github.com/oneapi-src/unified-runtime/issues/176 (at least that's my understanding). Here's some description from llvm repo: https://github.com/intel/llvm/blob/3b2b9ca9aef77b27709dc03f75ba8cfd4d85ebf6/sycl/doc/extensions/supported/sycl_ext_intel_buffer_location.asciidoc

jandres742 commented 1 year ago

@igchor

L0 actually has separate flags for device and host allocations: ze_device_mem_alloc_flag_t and ze_host_mem_alloc_flag_t - I'm not sure if we want to do the same.

I think we could have the same set of flags, but applications to those both types above. Reason in L0 there's a ze_device_mem_alloc_flags_t and a ze_host_mem_alloc_flags_t is to be able to set in shared allocation different behavior whether the allocation is in host or the device.

The last property I listed (UR_MEM_USM_ALLOC_BUFFER_LOCATION) seems to be something similar to device heaps we discussed in https://github.com/oneapi-src/unified-runtime/issues/176 (at least that's my understanding). Here's some description from llvm repo:

The documentation for the CL counterpart is here https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_mem_alloc_buffer_location.html#_overview

This is i think for FPGAs devices, not for GPUs, not sure the purpose. So maybe needed, but more for an extension? At least in L0 we dont have it.

bashbaug commented 1 year ago

I think it would be very wise to have an extensible mechanism to pass additional allocation properties as they are required. We shouldn't be constrained by a small bitfield.

I don't have a very strong opinion how to do this - it could be an array of property-value pairs like OpenCL, or some sort of structure chaining mechanism like Vulkan and Level Zero, or something else entirely.

igchor commented 1 year ago

@bashbaug Yes, I think a chaining mechanism would fit quite well. It's actually useful for solving https://github.com/oneapi-src/unified-runtime/issues/176 as well (as described here: https://github.com/oneapi-src/unified-runtime/issues/176#issuecomment-1405681528)