Closed wzamazon closed 1 year ago
Correct, the provider assumes that iovecs are not mixed. @aingerson should be able to determine if supporting mixed memory will be simple or not.
I suspect that there may be complications if the source and target iovecs are mixed, but are not aligned on sizes.
Going through old issues today and noticed this one that somehow I missed. Sorry about that. Adding this support will be a matter of just some refactoring so it shouldn't be too bad. The main limitation would be the IPC path which I don't see an easy way to do this. Any mixed interface calls would have to go through SAR. Is this support you would still be interested in? @wzamazon
Hi, @aingerson
Yes. We are still interested in this enhancement.
We recently did some evaluation, and enable this feature will bring some performance enhancement.
We recently did some evaluation, and enable this feature will bring some performance enhancement.
^^ for small message sizes
@wzamazon I opened a PR to add this support (#8480) but I don't have a great way to test it. Would appreciate your feedback on it!
@aingerson Thank you! I will run a couple tests on it!
We just noticed the multiple iov with mixed host memory and cuda memory will cause shm provider to crash.
Specifically, we want to send 2 iov. The first one is on host memory, the second one is on cuda memory. The memory type is specified correctly in the descriptors we passed to
fi_sendmsg
.We found such a send will cause shm provider to crash.
From reading the shm source code, I can see the code was written under the assumption that multiple IOV will have the same memory type. For example in function
smr_get_mr_hmem_iface
, the memory type was obtained from the first descriptor. Also insmr_tx_entry
, memory type is specified by an integeriface
.This feature is important for EFA because EFA provider uses shm provider to support communication on same host, and it will use mixed memory to support cuda memory. For small cuda messages (<4k), EFA will send 2 iov via shm, the first one is on host memory, which contains header. The second one would be application's buffer on gpu memory.