open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.13k stars 858 forks source link

accelerator/cuda: Dereference pointer to stream #12635

Closed devreal closed 3 months ago

devreal commented 3 months ago

cuMemcpyAsync and cuStreamSynchronize take a CUstream, not a pointer to CUstream.

Artifact of #12617

wenduwan commented 3 months ago

@devreal Thanks. Curious if this was caught by the compiler or something else?

Meanwhile let me run our CI to double check.

devreal commented 3 months ago

That was a compiler warning I saw when compiling against CUDA. Not sure how that doesn't crash, maybe we never hit this code path

devreal commented 3 months ago

@janjust looks like the NVIDIA CI ran out of disk space

bosilca commented 3 months ago

It crashes all over the place. I was just starting to investigate, but it looks legit.

==== backtrace (tid:2025366) ====
 0 0x000000000044fc34 cudbgMain()  ???:0
 1 0x000000000021ee0c cuEGLApiInit()  ???:0
 2 0x0000000000431e68 cudbgMain()  ???:0
 3 0x0000000000133d64 cuMemGetAttribute_v2()  ???:0
 4 0x000000000029c3c8 cuMemsetD2D8Async()  ???:0
 5 0x0000000000003f30 accelerator_cuda_memcpy()
 6 0x000000000026fdf0 mca_coll_accelerator_memcpy()  
 7 0x000000000026ff98 mca_coll_accelerator_allreduce()  
 8 0x00000000000cf920 PMPI_Allreduce()  
 9 0x0000000000403068 main()