hfinkel / llvm-project-cxxjit

Clang with JIT extensions
https://github.com/hfinkel/llvm-project-cxxjit/wiki
229 stars 23 forks source link

cuda jit don't sucess,please help me. #16

Closed lxwithgod closed 4 years ago

lxwithgod commented 4 years ago

hi, I try this project,I can't sucess on cuda jit.I use nvprof,I see cudaluanch/cudaSetArgument/....,there are not my kernel. @hfinkel

hfinkel commented 4 years ago

Does it work for you if you apply the change from https://github.com/hfinkel/llvm-project-cxxjit/issues/13? What version of CUDA, etc. are you using?

lxwithgod commented 4 years ago

I try it from #13,I use it on cuda 9.2 from your paper,and cuda 10.0.It can't work. this my code:

template<int size> __global__ void simpleKernel(float* out) { int idx = threadIdx.x + blockIdx.x * blockDim.x; if(idx<size){ out[idx]=1.0f; } } template<int size> [[clang::jit]] void jit_kernel(float* out){ simpleKernel<size><<<1,1024>>>(out); } int main(){ float* out; cudaMalloc((void**)&out,sizeof(float)*1024); jit_kernel<10>(out); return 0; } 无标题

lxwithgod commented 4 years ago

I try it #13 or don‘t use it. I try it master/clangjit-9.0. compile it is ok,but my jit_kernel isn't success. mybe I forget some steps.please help me check it

hfinkel commented 4 years ago

I use it on cuda 9.2 from your paper

In the paper, I was testing on a POWER8+NVIDIA system. Are you using an x86_64 host? Maybe there's some difference.

lxwithgod commented 4 years ago

yes.I have not POWER8. I' am using an x86_64 host.redhat and ubuntu can't success.but,I can use c++ jit, cuda_jit is not ok.

lxwithgod commented 4 years ago

hi @hfinkel can't you reproduce the problem?thanks

hfinkel commented 4 years ago

hi @hfinkel can't you reproduce the problem?thanks

Yes. This should now be fixed. There was a bug where the PTX generation would not occur for the first device configuration for which you were compiling.

hfinkel commented 4 years ago

Please reopen if this still doesn't work for you.

lxwithgod commented 4 years ago

thanks,I success this