How to query CUDA support at runtime?

openucx / ucx

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)

http://www.openucx.org

Other

1.14k stars 424 forks source link

How to query CUDA support at runtime? #5471

Open leofang opened 4 years ago

leofang commented 4 years ago

Hi, we are looking for a runtime way to check whether CUDA awareness is turned on in an MPI application, see the original discussion in the mpi4py repo. It turns out that if the MPI is built on top of UCX, neither the MPI library nor the users could know if CUDA support is kicked in or not, and only a hard segfault or other opaque errors would indirectly reveal this information.

It would be great if UCX could set up a mechanism (hopefully a public API) for us to query, at runtime:

If UCX is built with CUDA support (i.e., if --with-cuda is set either implicitly or explicitly)
If UCX's CUDA support is actually in effect or not

(AFAIK this mechanism does not exist yet, so please correct me if I am mistaken.) Then, we can take this information and propagate to Open MPI/MPICH/etc, and have them return the information when UCX is in use.

cc: @dalcinl @jsquyres @jakirkham

yosefe commented 4 years ago

UCX provides memory type support information through ucp_context_query() API and https://github.com/openucx/ucx/blob/master/src/ucp/api/ucp.h#L911

leofang commented 4 years ago

@yosefe Thanks for your quick reply. Can this query be done at program start time, before MPI_Init() is called or any CUDA context is established (via driver or runtime) or any GPU memory is allocated? I'm trying to see the limitations here as I'm not familiar with UCX. Thanks again.

yosefe commented 4 years ago

This has to be done after MPI_Init() i'll let @Akshay-Venkatesh and @bureddy comment if Cuda context has to be created

bureddy commented 4 years ago

not depend on the cuda context. just has to done after MPI_Init (ucp_init). for example, OMPI cuda is initialized only if UCX supports CUDA using this API (https://github.com/open-mpi/ompi/pull/7898)

leofang commented 4 years ago

Thank you, @yosefe @bureddy!

not depend on the cuda context. just has to done after MPI_Init (ucp_init).

This is very useful to know. @jakirkham I think this is all you need for ucx-py?

One last question: In UCX, am I understanding it correctly that once CUDA support is built, there's no way to disable it (such as doing OMPI_MCA_opal_cuda_support=0 or --mca opal_cuda_support 0 or --mca btl ^smcuda in Open MPI) at launch time, right? So calling ucp_context_query() alone is guaranteed to be correct?

Once this question is answered, @bureddy @yosefe @Akshay-Venkatesh I think we can move (Open) MPI-specific discussions to open-mpi/ompi#7963.

bureddy commented 4 years ago

you can disable cuda support during runtime explicitly by selecting non-cuda transports with -x UCX_TLS ucp_context_query() covers this case as well.

leofang commented 4 years ago

Thank you, @bureddy! I think it is clear that the ball is in MPI vendors' court now. I appreciate your quick responses. I will follow up in the Open MPI issue linked above (I'll also open an issue in MPICH).

jsquyres commented 4 years ago

@leofang You might as well re-open this, because the people you just talked to are the same people (literally 😄) who are responsible for MPIX_Query_cuda() in Open MPI. Meaning: you have the right people -- keep the conversation going and have them improve their MPIX_Query_cuda() function. 😎

leofang commented 4 years ago

Sure thing Jeff!

bureddy commented 4 years ago

@leofang can you give little more details on the usage? do you want to query cuda support (MPIX_Query_cuda()) before MPI_Init() or after? I think it's not possible to query runtime support before opening the components, which happens in MPI_Init()

leofang commented 4 years ago

Hi @bureddy During our discussion @dalcinl imagined a new API for querying CUDA support would work like MPI_Query_thread(), which cannot be called before MPI_Init() either, so I feel applying the same requirement could make your life simpler? By "the same" I mean:

the API should be thread safe
the API errors out before MPI_Init() is called
the API is no-throw after MPI_Init() is called (and before MPI_Finalized() is called?)
any behavior not mentioned above but could be reasonably expected

Even better, make it part of the MPI Standard V4.0! 😁 (Ignore me)

bureddy commented 4 years ago

@leofang, please see if https://github.com/open-mpi/ompi/pull/7970 is sufficient.