Closed alephman closed 7 years ago
Cool. Let's try going via cocl
to build the kernel:
https://gist.github.com/hughperkins/93d82ab1dd380334c911f1defc898e0f
this test (https://gist.github.com/hughperkins/93d82ab1dd380334c911f1defc898e0f) is OK.
Thats odd, I just realized, youre missing the , local int *scratch
parameter in your kernel. It should look like:
kernel void _Z8setValuePfif(global float* data, long data_offset, int idx, float value, local int *scratch) {
Can you recheck the contents of your cuda_sample-device.cl file? The change to add scratch
was like a week or two ago; and it should be in there I reckon?
(By the way, this file does all of th ekernel launching bit via cocl
methods: https://gist.github.com/hughperkins/ef17c1c5bd39fa7806425009ffbf8bda )
1) I use latest code to re-compile cuda-on-cl and dump the cuda_sample-device.cl here (https://gist.github.com/alephman/9d3b7f2d63ccd7f47a0b1e874ee1576f)
2) the file (https://gist.github.com/hughperkins/ef17c1c5bd39fa7806425009ffbf8bda ) is still OK too...
Ok. And just to check, if you do make run-cuda_sample
, that fails in the same way as before right?
i can't fix the issue of "/usr/include/features.h|374|fatal error: sys/cdefs.h: No such file or directory|" as my platform is ARM64, (https://askubuntu.com/questions/470796/fatal-error-sys-cdefs-h-no-such-file-or-directory doesn' work in this platform).
So i try another test:
Two-step compilation If you want, you can compile in two steps:
cocl -c teststream.cu g++ -o teststream teststream.o -lcocl -lclblast -leasycl -lclew -lpthread
it got error like this: https://gist.github.com/alephman/5b417541907b87704aaa6402cbd3fddf
Yes, the kernels are kind of all over the place at the moment, as I try different ways of convincing beignet to accept them...
Can you try branch "branches_as_switch"? The kernels still don't work on beignet, but at least they correspond to what I'm working on at the moment.
On 7 November 2016 01:22:05 GMT+00:00, alephman notifications@github.com wrote:
i can't fix the issue of "/usr/include/features.h|374|fatal error: sys/cdefs.h: No such file or directory|" as my platform is ARM64, (https://askubuntu.com/questions/470796/fatal-error-sys-cdefs-h-no-such-file-or-directory doesn' work in this platform).
So i try another test:
Two-step compilation If you want, you can compile in two steps:
cocl -c teststream.cu g++ -o teststream teststream.o -lcocl -lclblast -leasycl -lclew -lpthread
it got error like this:
https://gist.github.com/alephman/5b417541907b87704aaa6402cbd3fddfYou are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/cuda-on-cl/issues/3#issuecomment-258727744
Sent from my Android device with K-9 Mail. Please excuse my brevity.
sure , I just checkouted the branches_as_switch and did the same things as in master branch, it's the same result.
You might need to do a clean first. The kernels should look radically different. Can you do:
rm teststream*
... and try again?
what I did like this: clean /user/local/include,lib
git pull git checkout branches_as_switch
cd build rm -rf * cmake ..
make -j4 sudo make install
cd test/cocl cocl -fPIC -c teststream.cu g++ -o teststream teststream.o -lcocl -lclblast -leasycl -lclew -lpthread
./teststream test1 Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 got stream building kernel _z10longkernelpfif teststream: tools/intern/llvmufgen/USCInstVisitors.cpp:2179: virtual void llvm::UFWriter::visitGetElementPtrInst(llvm::GetElementPtrInst&): Assertion `(sDest.ePtrType == sBase.ePtrType) || bUseConst0Base' failed. Stack dump:
llvm::UFWriter::visitGetElementPtrInst(llvm::GetElementPtrInst&): Assertion `(sDest.ePtrType == sBase.ePtrType) || bUseConst0Base' failed.
cool :-) Ok. Let's add memory copy to use cocl too:
https://gist.github.com/hughperkins/7e2c50d211b4f501bba0c21caacef25c
Wow! Well, taht covers pretty much most stuff :-P
Let's try linking to cuda_sample.o, and using stuff from that...
You'll need to copy cuda_sample.o into the directory containng these scripts
Hmmm... I need to modify something in libcocl first. Give me a few minutes. Since I wnat to use the opencl sourcecode from cuda_sample.o, but currently it's stored in a string with a very non-c name:
$ nm cuda_sample.o | grep -i opencl
0000000000000000 D __opencl_sourcecode/home/ubuntu/git/cuda-on-cl/build/cuda_sample-device.cl
Hmmm, seems not obvious to link to this symbol, even after renaming it to something more sensible.
Whilst I ponder what we can try next, in the meantime, can we just double check that what we think you're running really is what you're running? Can you build/run https://gist.github.com/hughperkins/575e80bd1441e1be021ddbb99475daa7 , and provide the full output please?
The ouput is here: http://pastebin.ubuntu.com/23442178/ ,but it's for ARM...(-p. oooo, I just realized that the branch of "branches_as_switch" is for beignet(intel GPU), but I still test it on my ARM board... Wrong message for you, Sorry for that!
its a generic branch, not specific to hd5500, excpet in the sense that thats what I am using to test it on, and that hd5500 is the device that it motivating me to write it, since it seems not to work without it.
I shall be merging this branch onto master in next 1-2 days. It is the future :-) I'll make the transforms optional though. You can see already in the cocl -h
for this branch taht I started planning how that will work.
Ok, so, seems everything is all working, in theory. weird that it crashes in cuda_sample. can you just re-re-check that cuda_sample still doesnt work? (make sure to do cd build; rm test*
before re-trying)
(actually, cd build; rm cuda*
)
I am sure. I move the cuda-sample.cu file onto new folder ouside cuda-on-cl, and the try cocl -fPIC cuda-sample.cu command. I guess the LLVM causes the problem, different CPU architectures LLVM's backend is different.
@hughperkins I want to try clang 3.9 version. after installing clang3.9 and modifying cocl.Makefile, then run cocl -fPIC cuda_sample.cu: clang: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes
I found there is a __clang_cuda_runtime_wrapper.h file in /usr/local/cocl, but I don't know how I modify it?
I guess the LLVM causes the problem, different CPU architectures LLVM's backend is different.
I'm not sure I follow. Are you saying you're running the tests on two different machines?
Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes
Maybe use -nocudainc
? You can add it at line 19 of cocl.Makefile
yeah, I try the test on arm and x86 both. On my x86 laptop containing Nvida 4200M, cuda-sample test running is ok. On my original idea, the ll file is IR independent cpu archtectures, use arm-g++ to compile the ll file which works normally on x86. But it seems doesn't work as hostpatched ll is dependent x86.
clang++-3.9 -fPIC -DUSE_CLEW -noduainc -x cuda -std=c++11 --cuda-gpu-arch=sm_30 -D__CUDA_ARCH__=300 -I/usr/local/include/EasyCL -I/usr/local/include/cocl -I/usr/local/src/EasyCL -I/usr/local/src/EasyCL/thirdparty/clew/include -include /usr/local/include/cocl/cocl.h -include /usr/local/include/cocl/fake_funcs.h -include /usr/local/include/cocl/cocl_deviceside.h -I/usr/local/include cuda_sample.cu --cuda-device-only -emit-llvm -I/usr/include/aarch64-linux-gnu --target=aarch64-linux-gnu -O0 -S -o cuda_sample-device-noopt.ll clang: error: unknown argument: '-noduainc' clang: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
Hmmm, ok, I'm confused. Can you write out in detail which tests you ran on what please? Here is my understanding till now :-)
ARM PC:
- build and run g.cpp => works ok
ARM PC:
- build and run cuda_sample.cu => error message at opencl launch time
clang: error: unknown argument: '-noduainc'
shouldnt it be -nocudainc
?
sure! ARM board: (Clang 3.8 , gcc6.2, debian OS(jessie))
X86 ( ubuntu 16.04 , gcc 5.4 and clang 3.8, gpu nvidia 4200M) -use ir-to-opencl and patch-hostside to convert cu file to opencl OK.
clang++-3.9 -fPIC -DUSE_CLEW noduainc -x cuda -std=c++11 --cuda-gpu-arch=sm_30 -D__CUDA_ARCH__=300 -I/usr/local/include/EasyCL -I/usr/local/include/cocl -I/usr/local/src/EasyCL -I/usr/local/src/EasyCL/thirdparty/clew/include -include /usr/local/include/cocl/cocl.h -include /usr/local/include/cocl/fake_funcs.h -include /usr/local/include/cocl/cocl_deviceside.h -I/usr/local/include cuda_sample.cu --cuda-device-only -emit-llvm -I/usr/include/aarch64-linux-gnu -O0 -S -o cuda_sample-device-noopt.ll clang: error: no such file or directory: 'noduainc' clang: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
arent you missing a hyphen? Like -nocudainc
? and I think it is the word 'no' then 'cuda' then 'inc' joined together?
sorry, just copy-paste will cause this stupid problem,haha.
lang++-3.9 -fPIC -nocudainc -x cuda -DUSE_CLEW -std=c++11 -I/usr/include/aarch64-linux-gnu --target=aarch64-linux-gnu -I/usr/local/include -I/usr/local/include/EasyCL -I/usr/local/include/cocl -I/usr/local/src/EasyCL/thirdparty/clew/include -I/usr/local/src/EasyCL -include /usr/local/include/cocl/cocl.h -include /usr/local/include/cocl/fake_funcs.h -include /usr/local/include/cocl/cocl_hostside.h cuda_sample.cu --cuda-host-only -emit-llvm -O3 -S -o cuda_sample-hostraw.ll
opt-3.9 -O2 -inline -mem2reg -instcombine -S -o cuda_sample-device.ll cuda_sample-device-noopt.ll /usr/local/bin/ir-to-opencl --inputfile cuda_sample-device.ll --outputfile cuda_sample-device.cl /usr/local/bin/ir-to-opencl: cuda_sample-device.ll:2:1: error: expected top-level entity source_filename = "cuda_sample.cu"
cuda_sample-device.ll file:
; ModuleID = 'cuda_sample-device-noopt.ll' source_filename = "cuda_sample.cu" target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64" target triple = "nvptx64-nvidia-cuda"
use ir-to-opencl and patch-hostside to convert cu file to opencl is failed!
What do you mean by 'failed'? Can you provide the output from running ir-to-opencl on ARM, using clang-3.8?
ARM board: (Clang 3.8 , gcc6.2, debian OS(jessie)), list the testing result:
-pyopencl is OK.
-OpenCL api is ok.
-easycl api is ok.
-cocl api is OK.
-easy/ cocl api using clang/llvm 3.8 compiling/buliding is ok.
-use ir-to-opencl and patch-hostside tools to compile cu file is ok. When the generated file running, it may cause the error like: http://pastebin.ubuntu.com/23434196/ .
_Z name as a c++ mangled name, and it somehow demangles it, or something. Fix this issue like this:
The opencl code function name can probaby be modified (https://github.com/hughperkins/cuda-on-cl/blob/master/src/ir-to-opencl.cpp#L1401)
The hostside name can probably be modified (https://github.com/hughperkins/cuda-on-cl/blob/master/src/patch-hostside.cpp#L134)
re-compile the cu file by the ir-to-opencl and patch-hostside tools, and running the generated file,it may cause anther error( this issue is still pending):
building kernel _z8setvaluepfif cuda_sample: tools/intern/llvmufgen/USCInstVisitors.cpp:2179: virtual void llvm::UFWriter::visitGetElementPtrInst(llvm::GetElementPtrInst&): Assertion `(sDest.ePtrType == sBase.ePtrType) || bUseConst0Base' failed. Stack dump:
X86 ( ubuntu 16.04 , gcc 5.4 and clang 3.8, gpu nvidia 4200M) -use ir-to-opencl and patch-hostside to convert cu file to opencl OK.
Ok. But on ARM board, you can run g.cpp, including compiling, buliding, everythign, all on arm board, using clang/llvm 3.8, and it works perfectly?
yeah, g.cpp is OK with clang/llvm 3.8 .
Ok. So on ARM, using clang/llvm3.8, g.cpp works, but for some weird reason cuda_sample doesnt? So, going back to "I guess the LLVM causes the problem, different CPU architectures LLVM's backend is different.", I'm not sure I follow? Since we're using the exact same llvm/clang for compiling both g.cpp and cuda_sample.cu ?
Ok. So on ARM, using clang/llvm3.8, g.cpp works, but for some weird reason cuda_sample doesnt? yes. for .cu file doesn't work.
So, going back to "I guess the LLVM causes the problem, different CPU architectures LLVM's backend is different.", I'm not sure I follow? Since we're using the exact same llvm/clang for compiling both g.cpp and cuda_sample.cu ?
I am using your docker on my laptop, cuda_sample is ok. So I guess it's llvm/clang's issue
Here is how cuda-on-cl works. You'll note that llvm/clang are only used during compilation, not at runtime. The llvm error you are seeing is proably from the GPU driver, not connected to llvm used by cuda-on-cl. So I think that changing llvm version will make no difference.
^^^ all the above happens at compile time, and you confirm that cuda_sample compiles ok, right? Therefore all the llvm/clang bits are running ok.
Finally at rutnime the following hpapens:
^^ none of this uses llvm/clang. It only uses opencl, libcocl.
I'm not quite sure why g.cpp runs and cuda_sample doesnt, but I'm not convinced that changing clang/llvm will hcange much (though it could...). There must be something embedded inside the -hostpatched-ll or -hostpatched.o file that somehow modifies the behavior of the GPU driver. But I'm not sure what... I suppose it could be clang/llvm. But... you know, we are compiling g.cpp with clang too, right?
Random question: what happens if you build g.cpp using cocl
?
mkdir /tmp/foo
cp g.cpp /tmp/foo
cd /tmp/foo
cocl g.cpp
./g
oh wait, when you build cuda_sample.cu
, are you buidling with -fPIC
?
cocl -fPIC cuda_sample.cu
./cuda_sample
1) test g.cpp with cocl , it is ok! linaro@linaro-alip:~/test/hello/cpp5-cocl$ ./g Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 building kernel _z8setValuePfif ... built clfinish version g finished ok a[0]=555 a[1]=555 a[2]=123 a[3]=555 a[4]=555
2) I always use fPIC .
3) There must be something embedded inside the -hostpatched-ll or -hostpatched.o file that somehow modifies the behavior of the GPU driver. I think so!
4) I have another question: how patch-hostside convert duda-sample.cufile's cuda-host code to libcocl's code? It's magic.
The -fPIC parameter changes the ll file's structure? without fPIC, g++ will cause error then it's in linking stage.
g++ -Wl,-rpath,/usr/local/lib -Wl,-rpath,$ORIGIN -o g g.o -L/usr/local/lib -lcocl -lclblast -lOpenCL -leasycl -lclew -lpthread
/usr/bin/ld: g.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against external symbol _ZSt4cout@@GLIBCXX_3.4' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: g.o(.text+0x108): unresolvable R_AARCH64_ADR_PREL_PG_HI21 relocation against symbol
_ZSt4cout@@GLIBCXX_3.4'
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
/usr/local/bin/../share/cocl/cocl.Makefile:41: recipe for target 'g' failed
make: *\ [g] Error 1
how patch-hostside convert duda-sample.cufile's cuda-host code to libcocl's code? It's magic.
Yes! :-)
But more seriously:
libcocl.so
insteadThe -fPIC parameter changes the ll file's structure?
It makes the symbols relocatable. It wont affect the ll files. It affects the .o file. This step:
- llvm converts -hostpatched.ll => .o (compile time)
how patch-hostside convert duda-sample.cufile's cuda-host code to libcocl's code? It's magic.
Another way of ansewring this: it loads it using llvm C++ API, and hacks around with it, then it saves it. It's a c++ program, that uses llvm C++ API to hack the ll file. The code is in patch-hostside.cpp
That's really cool!
during this time, I want to test your eigen-on-cl on my arm board,any suggestion or readme for that?
One thing you could try, whilst I think of something cleverer. First dump the kernel:
COCL_DUMP_KERNEL=1 ./cuda_sample
... it should creat a file /tmp/out.cl
. Zip that, and put it somewhere that I can take a look at it. Then run, using that file:
COCL_LOAD_KERNEL=1 ./cuda_sample
... it should display a message like 'loading kernel', and means it's using whatever is in /tmp/out.cl. Can you paste the full output of both commands to gist please?
My board info: arm64 debian(jessie) , GCC 6.2, LLVM 3.8.
g++ -o build/test-cocl-cuda_sample build/test-cocl-cuda_sample.o -g -lcocl -lOpenCL /usr/bin/ld: build/test-cocl-cuda_sample.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against external C /usr/bin/ld: build/test-cocl-cuda_sample.o(.text+0xe8): unresolvable R_AARCH64_ADR_PREL_PG_HI21 rel' /usr/bin/ld: final link failed: Bad value collect2: error: ld returned 1 exit status Makefile:128: recipe for target 'build/test-cocl-cuda_sample' failed make: *\ [build/test-cocl-cuda_sample] Error 1