Closed sree314 closed 3 years ago
Just to clarify: I think these functions should already be supported using cuMemcpy
.
Trying to replay the trace for ld_const_s16.cu
PTX testcase results in an AssertionError
What value does dstDevice
contain?
The assert is triggered because the address cuMemcpyHtoD
is trying to copy to is not one that the library has seen in the trace (e.g. through a cuMemAlloc
)
Turns out we may need to implement cuModuleGetGlobal_v2
I've pushed an implementation of cuModuleGetGlobal_v2
to the getsymbol
branch.
These are CUDA Runtime API functions, not CUDA Device API functions, so they'll never be supported in
libcuda.
Turns out, neither of these have an equivalent in the CUDA Device API. So, I suspect that the CUDA runtime converts these functions to
cuMemcpy
under the hood along with some sort of ELF symbol lookup and linker magic.[This is needed for
ld_const_s16.cu
PTX testcase (among others)]