Closed torrance closed 1 year ago
I'm not sure what's causing your segfaults, but PTX is text, not binary, so it is null terminated and cannot contain null characters. nvrtcGetPTX[Size] includes the null terminator.
Righto, then it's only a problem if you're hipifying the code, as hiprtcGetCode()
will return binary when run on a rocm backend.
I will leave the patch as part of the full HIP patch-set then.
There's some issues around copying the device code used in JIT that assume the null character will never appear in the PTX (or ROCM) code.
Case 1:
https://github.com/ledatelescope/bifrost/blob/30fd2682ec22bc26d9dd8cd9bc424aefc4e7075e/src/map.cpp#L386-L398
Here the vector
vptx
is initialised and loaded with the device code, then the pointer to the first value is taken and assigned tostd::string ptx_string
, implicitly calling thestd::string(char*)
constructor.This is wrong in 3 ways: 1. it requires reading past the end of
vptx
to find a null terminating character; 2. it assumes the uninitialised memory passed that address is null; and 3 it assumes that the null character can never be part of the device code.Case 2:
https://github.com/ledatelescope/bifrost/blob/30fd2682ec22bc26d9dd8cd9bc424aefc4e7075e/src/cuda.hpp#L137-L148
The calling function is calling
std::string::c_str()
which is then being copied into a string via the implicitstd::string(char*)
constructor. This will fail if the null character forms part of the device code.The following patch remedies the segfaults I've been seeing: