Closed devreal closed 3 months ago
@devreal Thanks. Curious if this was caught by the compiler or something else?
Meanwhile let me run our CI to double check.
That was a compiler warning I saw when compiling against CUDA. Not sure how that doesn't crash, maybe we never hit this code path
@janjust looks like the NVIDIA CI ran out of disk space
It crashes all over the place. I was just starting to investigate, but it looks legit.
==== backtrace (tid:2025366) ====
0 0x000000000044fc34 cudbgMain() ???:0
1 0x000000000021ee0c cuEGLApiInit() ???:0
2 0x0000000000431e68 cudbgMain() ???:0
3 0x0000000000133d64 cuMemGetAttribute_v2() ???:0
4 0x000000000029c3c8 cuMemsetD2D8Async() ???:0
5 0x0000000000003f30 accelerator_cuda_memcpy()
6 0x000000000026fdf0 mca_coll_accelerator_memcpy()
7 0x000000000026ff98 mca_coll_accelerator_allreduce()
8 0x00000000000cf920 PMPI_Allreduce()
9 0x0000000000403068 main()
cuMemcpyAsync and cuStreamSynchronize take a CUstream, not a pointer to CUstream.
Artifact of #12617