Open leofang opened 4 years ago
UCX provides memory type support information through ucp_context_query() API and https://github.com/openucx/ucx/blob/master/src/ucp/api/ucp.h#L911
@yosefe Thanks for your quick reply. Can this query be done at program start time, before MPI_Init()
is called or any CUDA context is established (via driver or runtime) or any GPU memory is allocated? I'm trying to see the limitations here as I'm not familiar with UCX. Thanks again.
This has to be done after MPI_Init() i'll let @Akshay-Venkatesh and @bureddy comment if Cuda context has to be created
not depend on the cuda context. just has to done after MPI_Init (ucp_init). for example, OMPI cuda is initialized only if UCX supports CUDA using this API (https://github.com/open-mpi/ompi/pull/7898)
Thank you, @yosefe @bureddy!
not depend on the cuda context. just has to done after MPI_Init (ucp_init).
This is very useful to know. @jakirkham I think this is all you need for ucx-py?
One last question: In UCX, am I understanding it correctly that once CUDA support is built, there's no way to disable it (such as doing OMPI_MCA_opal_cuda_support=0
or --mca opal_cuda_support 0
or --mca btl ^smcuda
in Open MPI) at launch time, right? So calling ucp_context_query()
alone is guaranteed to be correct?
Once this question is answered, @bureddy @yosefe @Akshay-Venkatesh I think we can move (Open) MPI-specific discussions to open-mpi/ompi#7963.
you can disable cuda support during runtime explicitly by selecting non-cuda transports with -x UCX_TLS
ucp_context_query() covers this case as well.
Thank you, @bureddy! I think it is clear that the ball is in MPI vendors' court now. I appreciate your quick responses. I will follow up in the Open MPI issue linked above (I'll also open an issue in MPICH).
@leofang You might as well re-open this, because the people you just talked to are the same people (literally 😄) who are responsible for MPIX_Query_cuda()
in Open MPI. Meaning: you have the right people -- keep the conversation going and have them improve their MPIX_Query_cuda()
function. 😎
Sure thing Jeff!
@leofang can you give little more details on the usage? do you want to query cuda support (MPIX_Query_cuda()
) before MPI_Init()
or after? I think it's not possible to query runtime support before opening the components, which happens in MPI_Init()
Hi @bureddy During our discussion @dalcinl imagined a new API for querying CUDA support would work like MPI_Query_thread()
, which cannot be called before MPI_Init()
either, so I feel applying the same requirement could make your life simpler? By "the same" I mean:
MPI_Init()
is called MPI_Init()
is called (and before MPI_Finalized()
is called?)Even better, make it part of the MPI Standard V4.0! 😁 (Ignore me)
@leofang, please see if https://github.com/open-mpi/ompi/pull/7970 is sufficient.
Hi, we are looking for a runtime way to check whether CUDA awareness is turned on in an MPI application, see the original discussion in the mpi4py repo. It turns out that if the MPI is built on top of UCX, neither the MPI library nor the users could know if CUDA support is kicked in or not, and only a hard segfault or other opaque errors would indirectly reveal this information.
It would be great if UCX could set up a mechanism (hopefully a public API) for us to query, at runtime:
--with-cuda
is set either implicitly or explicitly)(AFAIK this mechanism does not exist yet, so please correct me if I am mistaken.) Then, we can take this information and propagate to Open MPI/MPICH/etc, and have them return the information when UCX is in use.
cc: @dalcinl @jsquyres @jakirkham