oneapi-src / unified-memory-framework

A library for constructing allocators and memory pools. It also contains broadly useful abstractions and utilities for memory management. UMF allows users to manage multiple memory pools characterized by different attributes, allowing certain allocation types to be isolated from others and allocated using different hardware resources as required.
https://oneapi-src.github.io/unified-memory-framework/
Other
42 stars 29 forks source link

proxy lib with shm + IPC scenario doesn't work #777

Open bratpiorka opened 1 month ago

bratpiorka commented 1 month ago

Please provide a reproduction of the bug:

Currently, 2 process scenario where the first process (client) uses UMF through Proxy Lib with UMF_PROXY="page.disposition=shared-shm" and sends IPC handles to the server, and second process (server) got data from the client and opens IPC handle doesn't work. This is because the server needs to know the name of the shm, as it is needed as an argument to the OS provider create.

How often bug is revealed:

always

Actual behavior:

Server process call to OpenIPCHandle() returns error.

Additional information about Priority and Help Requested:

Are you willing to submit a pull request with a proposed change? Yes

Requested priority: High

vinser52 commented 1 month ago

I thought the shared memory name is part of the IPC handle. Or I did not get the exact root cause of the problem. On the server side the os_open_ipc_handle function should use the name from the IPC handle. @ldorau could you please clarify on this?

vinser52 commented 1 month ago

I just checked the source code, and as I can see the OS provider does not use shm_name from the IPC handle. @ldorau is it a bug?

bratpiorka commented 1 month ago

Currently, we open shm that was set during the provider initialization, see https://github.com/oneapi-src/unified-memory-framework/blob/main/src/provider/provider_os_memory.c#L1255

When the user calls OpenIPCHandle() function, it needs a pool handle as an argument, so the provider must already be initialized. Since the shm name is an argument in the provider create function, there is no way to set or change it after initialization. So the one way to support the scenario from this issue is to first get the name of shm in client process, then send it to the server and finally let the server create the provider with shm name as an arg.

I think we can't simply change the code and open the shm from the ipc handle - please consider the scenario where the provider was created with one shm and IPC handle points to the second shm - this is fine to return error in this case.

vinser52 commented 1 month ago

Currently, we open shm that was set during the provider initialization, see https://github.com/oneapi-src/unified-memory-framework/blob/main/src/provider/provider_os_memory.c#L1255

When the user calls OpenIPCHandle() function, it needs a pool handle as an argument, so the provider must already be initialized. Since the shm name is an argument in the provider create function, there is no way to set or change it after initialization. So the one way to support the scenario from this issue is to first get the name of shm in client process, then send it to the server and finally let the server create the provider with shm name as an arg.

I think we can't simply change the code and open the shm from the ipc handle - please consider the scenario where the provider was created with one shm and IPC handle points to the second shm - this is fine to return error in this case.

First, let's forget about IPC and consider a simpler scenario. How two providers from different processes can use the same SHM region? I think it is an error.

The second question regarding IPC, the os_open_ipc_handle should use shm_name from the ipc handle not the the shm_name from the local provider. It is exactly the idea of IPC when one process create memory provider/pool with some config and can expose the memory to another process via IPC handles.

bratpiorka commented 1 month ago

@vinser52 what about devdax and file provider? Can two processes use the same devdax or file provider?

vinser52 commented 1 month ago

@vinser52 what about devdax and file provider? Can two processes use the same devdax or file provider?

The same considerations applied to all memory providers that works with memory that is visible/accessible by multiple processes. If two providers allocates from the same physical memory locations then you need a special pool manager that can coordinates between multiple pools (UMF does not have such pool manager, and AFAIK PMDK also had no such one). For cross-process communication we have IPC functionality when memory allocated by one process can be accessed by another process via IPC handle, but allocations from the same physical memory is not supported by multiple pools.

Regarding shared memory, file or dax memory providers, I can imagine when different process can use different locations from shm/file/devdax: e.g. pool 1 uses offsets from 0 to n1, pool2 uses offsets from n1 to n2, etc.

bratpiorka commented 1 month ago

ok, so for the OS provider the fix is here: https://github.com/oneapi-src/unified-memory-framework/pull/779 , but other providers should also be fixed.

One additional question - what about opening an IPC handle from a provider of other types? E.g. the user requests a file-based pool to open a handle from GPU memory. This will not work and is a user error, right?

vinser52 commented 1 month ago

ok, so for the OS provider the fix is here: #779

great.

but other providers should also be fixed.

Yes if they have similar issue.

One additional question - what about opening an IPC handle from a provider of other types? E.g. the user requests a file-based pool to open a handle from GPU memory. This will not work and is a user error, right?

Yeah, different providers are not allowed. The same memory provider type should be used to get and open IPC handles.