Closed wzamazon closed 1 year ago
Peer to peer is meant to describe PCI peer to peer transfers, or device to device transfers that do not require bouncing data through host buffers. This could also apply to other device buses, not just PCI.
I see.
I think for the case of NCCL, HMEM_P2P_REQUIRED
is too strong. Basically, it need a way to know whether the provider is capable of P2P, not necessarily that all transfer must be through peer 2 peer.
I am reading the man page for FI_HMEM_P2P_ENABLED
. It did not specify what provider should do if it does not support Peer 2 peer.
Would it be reasonable for a provider to return -FI_EOPNOSUPP, if user set FI_HMEM_P2P_ENABLED and the provider is incapable of peer 2 peer support?
Maybe the question is whether HMEM_P2P_REQUIRED is useful? Or is it only useful if it also allows gdrcopy?
Does gdrcopy behave the same as if p2p were used?
Maybe the question is whether HMEM_P2P_REQUIRED is useful? Or is it only useful if it also allows gdrcopy?
I think P2P_REQUIRED is still useful, if we define P2P support as NIC access HMEM memory directly.
I can think of at least 1 case that NCCL does not want libfabric to only use NIC to access HMEM memory, (do NOT use gdrcopy), which is when NCCL uses its LL128 protocol.
Does gdrcopy behave the same as if p2p were used?
I do not think so. gdrcopy basically map GPU memory to host's memory address space. then do a memcpy
, so it is driven by CPU.
So, it sounds like we need some other option that can be used to query/restrict the type of operations that a provider can undertake. Maybe this is a new HMEM option, or some sort of XPU option. Right now there's no way to convey that P2P is okay, but if you can't use P2P, then only this 'other' mechanism is usable.
That's hard to define generically, however. Maybe it's something like P2P_OR_CPU_ONLY?
From ofiwg call: Keep current FI_HMEM_P2P options restrictive in the definition. May need CUDA specific option. NCCL restricts the use of any CUDA call from any lower layer. Proposal: FI_CUDA_API_ENABLED/ALLOWED/DISABLED/PERMITTED ? Boolean option is sufficient.
https://github.com/ofiwg/libfabric/pull/8624 introduced FI_CUDA_API_PERMITTED
Has this issue been resolved with the introduction of FI_CUDA_API_PERMITTED?
Yes
This came from a discussion https://github.com/ofiwg/libfabric/pull/8529
Background is that application like NCCL need a way to specify libfabric endpoint cannot make calls to CUDA API to support CUDA memory.
@shefty suggested to use the FI_OPT_HMEM_P2P_REQUIRED, which currently states as the following:
From https://ofiwg.github.io/libfabric/main/man/fi_endpoint.3.html
However, to use this option for the purpose I described, we need a definition of "peer to peer support", which is lacking in the
fi_endpoint
document. So I opened this issue to ask whether libfabric community to agree on a definition for "peer to peer" support.One thing I want to mention is that NCCL does allow libfabric to use GDRcopy, see this comment from @jdinan. EFA provider does use GDRcopy when used by NCCL, and found it to be efficient for small messages.
I understand that other providers, like RxM, also want to use GDRcopy to support NCCL.
Therefore, I think it would be ideal if we can define "peer to peer support" in a way that mechanisms like GDRcopy is counted as "peer to peer" support.