Is your feature request related to a problem? Please describe.
libfabric user need stronger commitment from libfabric providers about implementing FI_OPT_CUDA_API_PERMITTED
Describe the solution you'd like
This flag FI_OPT_CUDA_API_PERMITTED, which was introduced for middleware like NCCL to disable CUDA API call.
This issue propose that libfabric have a stronger commitment for the flag.
Specifically, I propose to add
"Any libfabric provider that claim support of FI_HMEM is guaranteed to implement this option"
to the document of this flag.
This is because the information from this flag is critical for NCCL, and NCCL absolutely need the information. If a provider support FI_HMEM but does not implement this option, NCCL does not know how to proceed.
Additional context
I looked into the code. It seems that there are 4 providers that support FI_HMEM: shm, efa, verbs and rxm.
I have a [PR]( implement this option for EFA.
SHM should be straight forward. I can put up a PR for that too.
I can look into RxM and Verbs too (though help is appreciated).
Is your feature request related to a problem? Please describe. libfabric user need stronger commitment from libfabric providers about implementing FI_OPT_CUDA_API_PERMITTED
Describe the solution you'd like This flag FI_OPT_CUDA_API_PERMITTED, which was introduced for middleware like NCCL to disable CUDA API call.
This issue propose that libfabric have a stronger commitment for the flag.
Specifically, I propose to add
"Any libfabric provider that claim support of FI_HMEM is guaranteed to implement this option"
to the document of this flag.
This is because the information from this flag is critical for NCCL, and NCCL absolutely need the information. If a provider support FI_HMEM but does not implement this option, NCCL does not know how to proceed.
Additional context I looked into the code. It seems that there are 4 providers that support FI_HMEM: shm, efa, verbs and rxm.
I have a [PR]( implement this option for EFA.
SHM should be straight forward. I can put up a PR for that too.
I can look into RxM and Verbs too (though help is appreciated).