oneapi-src / unified-runtime

https://oneapi-src.github.io/unified-runtime/
Other
33 stars 113 forks source link

Creating USM Pools for specific memory types. #526

Open bmyates opened 1 year ago

bmyates commented 1 year ago

urUSMPoolCreate does not have a method of specifying if a memory pool should be for host, device, or shared memory. Adaptors must returnur_usm_pool_handle_t objects that are capable of working for all memory types and all devices. This forces some extra overhead that would not be needed if adapters knew which type of memory pool was being requested ahead of time.

This could be implemented by either expanding ur_usm_pool_desc_t, or adding a new struct that can be chained to ur_usm_pool_desc_t.

pbalcer commented 1 year ago

I think that was @igchor's original intention for the pools to be generic, and then have we'd code in common that would appropriately manage and redirect allocations to internal sub-pools. Let's wait for him to chime in.

igchor commented 1 year ago

@bmyates sorry for the delay, I was out of office. @pbalcer is right that the intention was to keep the API generic and let the UR implement any sub-pools internally (if needed).

With the current API, you can create a memory pool and use it to allocate memory of any type. Internally, UR (at least for L0) will create sub-pools for Host, Device, Shared and SharedReadOnly memory. So if you call different allocation functions with the same pool handle, like this: urUSMSharedAlloc(..., pool1, ...) and urUSMDeviceAlloc(..., pool1, ...) the memory will be served from different physical sub-pools. How the sub-pools are created is currently an implementation detail - it allows UR to choose the best implementation for a particular adapter.

I think the benefit of this design is that user does not need to map different memory types to different pools - this is done internally in the UR.

This forces some extra overhead that would not be needed if adapters knew which type of memory pool was being requested ahead of time.

What exact overhead does it add? I think the only overhead is storing the metadata (you have to create an instance of a UMA memory pool for each memory type) - and this should be rather negligible. The actual device memory is not allocated until a user requests shared/device allocation.

bmyates commented 1 year ago

Hi @igchor thanks for the reply,

With the current API, you can create a memory pool and use it to allocate memory of any type. Internally, UR (at least for L0) will create sub-pools for Host, Device, Shared and SharedReadOnly memory.

I implemented this functionality earlier this month. You can review my original PR here https://github.com/jandres742/llvm/pull/11

What exact overhead does it add? I think the only overhead is storing the metadata (you have to create an instance of a UMA memory pool for each memory type) - and this should be rather negligible. The actual device memory is not allocated until a user requests shared/device allocation.

Yes this is the extra overhead I am thinking of. Current implementation requires the adapter to create a pool object for each USM type and for each device and configure it's metadata. This extra overhead could be avoided by adding a flag to specify the type of memory pool desired. I agree with you that this is not a significant amount of overhead, but it is some, and could be avoided. In addition, conceptually I think it is unintuitive to have a single Pool object actually represent many different pools across different memory regions and devices.

igchor commented 1 year ago

Got it, now that we can relicense the code, I'm also planning to work on the generic pool manager implementation that I mentioned previously (that can be shared between different adapters).

I think it boils down to what level of abstraction we want to have at the UR level. One option is to make the ur_memory_pool represent the actual physical memory region of a particular type on a particular device and let the user manage them. The other option (the current state) is to make the ur_memory_pool a logical pool that is mapped to the physical memory in some implementation-defined way.

The second option gives us more flexibility and allows us to create separate 'sub-pools' not only for different memory types but also for memory with different access characteristics (e.g. read-only vs read-write), which I think is quite useful. We can also optimize memory placement for each adapter separately.

Also, I think the overhead that you mentioned will be present anyway - just at a different layer, with per-device/per-type pools, user would need to have some kind of mapping to know from which pool he needs to allocate. The idea I had in mind was to hide all of those std::unordered_map<ur_device_handle_t, std::unique_ptr<USMAllocContext> in UR.