hughperkins / coriander

Build NVIDIA® CUDA™ code for OpenCL™ 1.2 devices
Apache License 2.0
842 stars 88 forks source link

porting on aarch64 #3

Closed alephman closed 7 years ago

alephman commented 8 years ago

My board info: arm64 debian(jessie) , GCC 6.2, LLVM 3.8.

  1. this step is success. git clone --recursive https://github.com/hughperkins/cuda-on-cl cd cuda-on-cl make sudo make install
  2. make run-test-cocl-cuda_sample

g++ -o build/test-cocl-cuda_sample build/test-cocl-cuda_sample.o -g -lcocl -lOpenCL /usr/bin/ld: build/test-cocl-cuda_sample.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against external C /usr/bin/ld: build/test-cocl-cuda_sample.o(.text+0xe8): unresolvable R_AARCH64_ADR_PREL_PG_HI21 rel' /usr/bin/ld: final link failed: Bad value collect2: error: ld returned 1 exit status Makefile:128: recipe for target 'build/test-cocl-cuda_sample' failed make: *\ [build/test-cocl-cuda_sample] Error 1

hughperkins commented 8 years ago

Ths one: https://gist.github.com/hughperkins/a14cc48a95c551ddf718d4c807c06f0f

alephman commented 8 years ago

this one(https://gist.github.com/hughperkins/a14cc48a95c551ddf718d4c807c06f0f) is OK

hughperkins commented 8 years ago

Cool. Let's try going via cocl to build the kernel:

https://gist.github.com/hughperkins/93d82ab1dd380334c911f1defc898e0f

alephman commented 8 years ago

this test (https://gist.github.com/hughperkins/93d82ab1dd380334c911f1defc898e0f) is OK.

hughperkins commented 8 years ago

Thats odd, I just realized, youre missing the , local int *scratch parameter in your kernel. It should look like:

kernel void _Z8setValuePfif(global float* data, long data_offset, int idx, float value, local int *scratch) {

Can you recheck the contents of your cuda_sample-device.cl file? The change to add scratch was like a week or two ago; and it should be in there I reckon?

hughperkins commented 8 years ago

(By the way, this file does all of th ekernel launching bit via cocl methods: https://gist.github.com/hughperkins/ef17c1c5bd39fa7806425009ffbf8bda )

alephman commented 8 years ago

1) I use latest code to re-compile cuda-on-cl and dump the cuda_sample-device.cl here (https://gist.github.com/alephman/9d3b7f2d63ccd7f47a0b1e874ee1576f)

2) the file (https://gist.github.com/hughperkins/ef17c1c5bd39fa7806425009ffbf8bda ) is still OK too...

hughperkins commented 8 years ago

Ok. And just to check, if you do make run-cuda_sample, that fails in the same way as before right?

alephman commented 8 years ago

i can't fix the issue of "/usr/include/features.h|374|fatal error: sys/cdefs.h: No such file or directory|" as my platform is ARM64, (https://askubuntu.com/questions/470796/fatal-error-sys-cdefs-h-no-such-file-or-directory doesn' work in this platform).

So i try another test:

Two-step compilation If you want, you can compile in two steps:

cocl -c teststream.cu g++ -o teststream teststream.o -lcocl -lclblast -leasycl -lclew -lpthread

it got error like this: https://gist.github.com/alephman/5b417541907b87704aaa6402cbd3fddf

hughperkins commented 8 years ago

Yes, the kernels are kind of all over the place at the moment, as I try different ways of convincing beignet to accept them...

Can you try branch "branches_as_switch"? The kernels still don't work on beignet, but at least they correspond to what I'm working on at the moment.

On 7 November 2016 01:22:05 GMT+00:00, alephman notifications@github.com wrote:

i can't fix the issue of "/usr/include/features.h|374|fatal error: sys/cdefs.h: No such file or directory|" as my platform is ARM64, (https://askubuntu.com/questions/470796/fatal-error-sys-cdefs-h-no-such-file-or-directory doesn' work in this platform).

So i try another test:

Two-step compilation If you want, you can compile in two steps:

cocl -c teststream.cu g++ -o teststream teststream.o -lcocl -lclblast -leasycl -lclew -lpthread

it got error like this:
https://gist.github.com/alephman/5b417541907b87704aaa6402cbd3fddf

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/cuda-on-cl/issues/3#issuecomment-258727744

Sent from my Android device with K-9 Mail. Please excuse my brevity.

alephman commented 8 years ago

sure , I just checkouted the branches_as_switch and did the same things as in master branch, it's the same result.

hughperkins commented 8 years ago

You might need to do a clean first. The kernels should look radically different. Can you do:

rm teststream*

... and try again?

alephman commented 8 years ago

what I did like this: clean /user/local/include,lib

git pull git checkout branches_as_switch

cd build rm -rf * cmake ..

make -j4 sudo make install

cd test/cocl cocl -fPIC -c teststream.cu g++ -o teststream teststream.o -lcocl -lclblast -leasycl -lclew -lpthread

./teststream test1 Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 got stream building kernel _z10longkernelpfif teststream: tools/intern/llvmufgen/USCInstVisitors.cpp:2179: virtual void llvm::UFWriter::visitGetElementPtrInst(llvm::GetElementPtrInst&): Assertion `(sDest.ePtrType == sBase.ePtrType) || bUseConst0Base' failed. Stack dump:

  1. Running pass 'UniFlex generator' on module 'BuildGroup_1'. Aborted
hughperkins commented 8 years ago

llvm::UFWriter::visitGetElementPtrInst(llvm::GetElementPtrInst&): Assertion `(sDest.ePtrType == sBase.ePtrType) || bUseConst0Base' failed.

cool :-) Ok. Let's add memory copy to use cocl too:

https://gist.github.com/hughperkins/7e2c50d211b4f501bba0c21caacef25c

alephman commented 8 years ago

(https://gist.github.com/hughperkins/7e2c50d211b4f501bba0c21caacef25c) run ok!

hughperkins commented 8 years ago

Wow! Well, taht covers pretty much most stuff :-P

Let's try linking to cuda_sample.o, and using stuff from that...

You'll need to copy cuda_sample.o into the directory containng these scripts

Hmmm... I need to modify something in libcocl first. Give me a few minutes. Since I wnat to use the opencl sourcecode from cuda_sample.o, but currently it's stored in a string with a very non-c name:

$ nm cuda_sample.o | grep -i opencl
0000000000000000 D __opencl_sourcecode/home/ubuntu/git/cuda-on-cl/build/cuda_sample-device.cl
hughperkins commented 8 years ago

Hmmm, seems not obvious to link to this symbol, even after renaming it to something more sensible.

Whilst I ponder what we can try next, in the meantime, can we just double check that what we think you're running really is what you're running? Can you build/run https://gist.github.com/hughperkins/575e80bd1441e1be021ddbb99475daa7 , and provide the full output please?

alephman commented 8 years ago

The ouput is here: http://pastebin.ubuntu.com/23442178/ ,but it's for ARM...(-p. oooo, I just realized that the branch of "branches_as_switch" is for beignet(intel GPU), but I still test it on my ARM board... Wrong message for you, Sorry for that!

hughperkins commented 8 years ago

its a generic branch, not specific to hd5500, excpet in the sense that thats what I am using to test it on, and that hd5500 is the device that it motivating me to write it, since it seems not to work without it.

I shall be merging this branch onto master in next 1-2 days. It is the future :-) I'll make the transforms optional though. You can see already in the cocl -h for this branch taht I started planning how that will work.

Ok, so, seems everything is all working, in theory. weird that it crashes in cuda_sample. can you just re-re-check that cuda_sample still doesnt work? (make sure to do cd build; rm test* before re-trying)

hughperkins commented 8 years ago

(actually, cd build; rm cuda*)

alephman commented 8 years ago

I am sure. I move the cuda-sample.cu file onto new folder ouside cuda-on-cl, and the try cocl -fPIC cuda-sample.cu command. I guess the LLVM causes the problem, different CPU architectures LLVM's backend is different.

alephman commented 8 years ago

@hughperkins I want to try clang 3.9 version. after installing clang3.9 and modifying cocl.Makefile, then run cocl -fPIC cuda_sample.cu: clang: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes

I found there is a __clang_cuda_runtime_wrapper.h file in /usr/local/cocl, but I don't know how I modify it?

hughperkins commented 8 years ago

I guess the LLVM causes the problem, different CPU architectures LLVM's backend is different.

I'm not sure I follow. Are you saying you're running the tests on two different machines?

hughperkins commented 8 years ago

Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes

Maybe use -nocudainc? You can add it at line 19 of cocl.Makefile

alephman commented 8 years ago

yeah, I try the test on arm and x86 both. On my x86 laptop containing Nvida 4200M, cuda-sample test running is ok. On my original idea, the ll file is IR independent cpu archtectures, use arm-g++ to compile the ll file which works normally on x86. But it seems doesn't work as hostpatched ll is dependent x86.

alephman commented 8 years ago

clang++-3.9 -fPIC -DUSE_CLEW -noduainc -x cuda -std=c++11 --cuda-gpu-arch=sm_30 -D__CUDA_ARCH__=300 -I/usr/local/include/EasyCL -I/usr/local/include/cocl -I/usr/local/src/EasyCL -I/usr/local/src/EasyCL/thirdparty/clew/include -include /usr/local/include/cocl/cocl.h -include /usr/local/include/cocl/fake_funcs.h -include /usr/local/include/cocl/cocl_deviceside.h -I/usr/local/include cuda_sample.cu --cuda-device-only -emit-llvm -I/usr/include/aarch64-linux-gnu --target=aarch64-linux-gnu -O0 -S -o cuda_sample-device-noopt.ll clang: error: unknown argument: '-noduainc' clang: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.

hughperkins commented 8 years ago

Hmmm, ok, I'm confused. Can you write out in detail which tests you ran on what please? Here is my understanding till now :-)

ARM PC:
- build and run g.cpp => works ok

ARM PC:
- build and run cuda_sample.cu => error message at opencl launch time
hughperkins commented 8 years ago

clang: error: unknown argument: '-noduainc'

shouldnt it be -nocudainc?

alephman commented 8 years ago

sure! ARM board: (Clang 3.8 , gcc6.2, debian OS(jessie))

X86 ( ubuntu 16.04 , gcc 5.4 and clang 3.8, gpu nvidia 4200M) -use ir-to-opencl and patch-hostside to convert cu file to opencl OK.

alephman commented 8 years ago

clang++-3.9 -fPIC -DUSE_CLEW noduainc -x cuda -std=c++11 --cuda-gpu-arch=sm_30 -D__CUDA_ARCH__=300 -I/usr/local/include/EasyCL -I/usr/local/include/cocl -I/usr/local/src/EasyCL -I/usr/local/src/EasyCL/thirdparty/clew/include -include /usr/local/include/cocl/cocl.h -include /usr/local/include/cocl/fake_funcs.h -include /usr/local/include/cocl/cocl_deviceside.h -I/usr/local/include cuda_sample.cu --cuda-device-only -emit-llvm -I/usr/include/aarch64-linux-gnu -O0 -S -o cuda_sample-device-noopt.ll clang: error: no such file or directory: 'noduainc' clang: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.

hughperkins commented 8 years ago

arent you missing a hyphen? Like -nocudainc? and I think it is the word 'no' then 'cuda' then 'inc' joined together?

alephman commented 8 years ago

sorry, just copy-paste will cause this stupid problem,haha.

lang++-3.9 -fPIC -nocudainc -x cuda -DUSE_CLEW -std=c++11 -I/usr/include/aarch64-linux-gnu --target=aarch64-linux-gnu -I/usr/local/include -I/usr/local/include/EasyCL -I/usr/local/include/cocl -I/usr/local/src/EasyCL/thirdparty/clew/include -I/usr/local/src/EasyCL -include /usr/local/include/cocl/cocl.h -include /usr/local/include/cocl/fake_funcs.h -include /usr/local/include/cocl/cocl_hostside.h cuda_sample.cu --cuda-host-only -emit-llvm -O3 -S -o cuda_sample-hostraw.ll

opt-3.8 -O2 -inline -mem2reg -instcombine -S -o cuda_sample-device.ll cuda_sample-device-noopt.ll

opt-3.9 -O2 -inline -mem2reg -instcombine -S -o cuda_sample-device.ll cuda_sample-device-noopt.ll /usr/local/bin/ir-to-opencl --inputfile cuda_sample-device.ll --outputfile cuda_sample-device.cl /usr/local/bin/ir-to-opencl: cuda_sample-device.ll:2:1: error: expected top-level entity source_filename = "cuda_sample.cu"

alephman commented 8 years ago

cuda_sample-device.ll file:

; ModuleID = 'cuda_sample-device-noopt.ll' source_filename = "cuda_sample.cu" target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64" target triple = "nvptx64-nvidia-cuda"

hughperkins commented 8 years ago

use ir-to-opencl and patch-hostside to convert cu file to opencl is failed!

What do you mean by 'failed'? Can you provide the output from running ir-to-opencl on ARM, using clang-3.8?

alephman commented 8 years ago

ARM board: (Clang 3.8 , gcc6.2, debian OS(jessie)), list the testing result:

-pyopencl is OK. -OpenCL api is ok. -easycl api is ok. -cocl api is OK. -easy/ cocl api using clang/llvm 3.8 compiling/buliding is ok. -use ir-to-opencl and patch-hostside tools to compile cu file is ok. When the generated file running, it may cause the error like: http://pastebin.ubuntu.com/23434196/ .
_Z name as a c++ mangled name, and it somehow demangles it, or something. Fix this issue like this: The opencl code function name can probaby be modified (https://github.com/hughperkins/cuda-on-cl/blob/master/src/ir-to-opencl.cpp#L1401) The hostside name can probably be modified (https://github.com/hughperkins/cuda-on-cl/blob/master/src/patch-hostside.cpp#L134)

re-compile the cu file by the ir-to-opencl and patch-hostside tools, and running the generated file,it may cause anther error( this issue is still pending):

building kernel _z8setvaluepfif cuda_sample: tools/intern/llvmufgen/USCInstVisitors.cpp:2179: virtual void llvm::UFWriter::visitGetElementPtrInst(llvm::GetElementPtrInst&): Assertion `(sDest.ePtrType == sBase.ePtrType) || bUseConst0Base' failed. Stack dump:

  1. Running pass 'UniFlex generator' on module 'BuildGroup_1'.

X86 ( ubuntu 16.04 , gcc 5.4 and clang 3.8, gpu nvidia 4200M) -use ir-to-opencl and patch-hostside to convert cu file to opencl OK.

hughperkins commented 8 years ago

Ok. But on ARM board, you can run g.cpp, including compiling, buliding, everythign, all on arm board, using clang/llvm 3.8, and it works perfectly?

alephman commented 8 years ago

yeah, g.cpp is OK with clang/llvm 3.8 .

hughperkins commented 8 years ago

Ok. So on ARM, using clang/llvm3.8, g.cpp works, but for some weird reason cuda_sample doesnt? So, going back to "I guess the LLVM causes the problem, different CPU architectures LLVM's backend is different.", I'm not sure I follow? Since we're using the exact same llvm/clang for compiling both g.cpp and cuda_sample.cu ?

alephman commented 8 years ago

Ok. So on ARM, using clang/llvm3.8, g.cpp works, but for some weird reason cuda_sample doesnt? yes. for .cu file doesn't work.

So, going back to "I guess the LLVM causes the problem, different CPU architectures LLVM's backend is different.", I'm not sure I follow? Since we're using the exact same llvm/clang for compiling both g.cpp and cuda_sample.cu ?

I am using your docker on my laptop, cuda_sample is ok. So I guess it's llvm/clang's issue

hughperkins commented 8 years ago

Here is how cuda-on-cl works. You'll note that llvm/clang are only used during compilation, not at runtime. The llvm error you are seeing is proably from the GPU driver, not connected to llvm used by cuda-on-cl. So I think that changing llvm version will make no difference.

^^^ all the above happens at compile time, and you confirm that cuda_sample compiles ok, right? Therefore all the llvm/clang bits are running ok.

Finally at rutnime the following hpapens:

^^ none of this uses llvm/clang. It only uses opencl, libcocl.

I'm not quite sure why g.cpp runs and cuda_sample doesnt, but I'm not convinced that changing clang/llvm will hcange much (though it could...). There must be something embedded inside the -hostpatched-ll or -hostpatched.o file that somehow modifies the behavior of the GPU driver. But I'm not sure what... I suppose it could be clang/llvm. But... you know, we are compiling g.cpp with clang too, right?

hughperkins commented 8 years ago

Random question: what happens if you build g.cpp using cocl?

mkdir /tmp/foo
cp g.cpp /tmp/foo
cd /tmp/foo
cocl g.cpp
./g
hughperkins commented 8 years ago

oh wait, when you build cuda_sample.cu, are you buidling with -fPIC?

cocl -fPIC cuda_sample.cu
./cuda_sample
alephman commented 8 years ago

1) test g.cpp with cocl , it is ok! linaro@linaro-alip:~/test/hello/cpp5-cocl$ ./g Using Imagination Technologies , OpenCL platform: PowerVR Rogue Using OpenCL device: PowerVR Rogue G6230 building kernel _z8setValuePfif ... built clfinish version g finished ok a[0]=555 a[1]=555 a[2]=123 a[3]=555 a[4]=555

2) I always use fPIC .

3) There must be something embedded inside the -hostpatched-ll or -hostpatched.o file that somehow modifies the behavior of the GPU driver. I think so!

4) I have another question: how patch-hostside convert duda-sample.cufile's cuda-host code to libcocl's code? It's magic.

alephman commented 8 years ago

The -fPIC parameter changes the ll file's structure? without fPIC, g++ will cause error then it's in linking stage.

g++ -Wl,-rpath,/usr/local/lib -Wl,-rpath,$ORIGIN -o g g.o -L/usr/local/lib -lcocl -lclblast -lOpenCL -leasycl -lclew -lpthread /usr/bin/ld: g.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against external symbol _ZSt4cout@@GLIBCXX_3.4' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: g.o(.text+0x108): unresolvable R_AARCH64_ADR_PREL_PG_HI21 relocation against symbol_ZSt4cout@@GLIBCXX_3.4' /usr/bin/ld: final link failed: Bad value collect2: error: ld returned 1 exit status /usr/local/bin/../share/cocl/cocl.Makefile:41: recipe for target 'g' failed make: *\ [g] Error 1

hughperkins commented 8 years ago

how patch-hostside convert duda-sample.cufile's cuda-host code to libcocl's code? It's magic.

Yes! :-)

But more seriously:

hughperkins commented 8 years ago

The -fPIC parameter changes the ll file's structure?

It makes the symbols relocatable. It wont affect the ll files. It affects the .o file. This step:

- llvm converts -hostpatched.ll => .o (compile time)
hughperkins commented 8 years ago

how patch-hostside convert duda-sample.cufile's cuda-host code to libcocl's code? It's magic.

Another way of ansewring this: it loads it using llvm C++ API, and hacks around with it, then it saves it. It's a c++ program, that uses llvm C++ API to hack the ll file. The code is in patch-hostside.cpp

alephman commented 8 years ago

That's really cool!

alephman commented 8 years ago

during this time, I want to test your eigen-on-cl on my arm board,any suggestion or readme for that?

hughperkins commented 8 years ago

One thing you could try, whilst I think of something cleverer. First dump the kernel:

COCL_DUMP_KERNEL=1 ./cuda_sample

... it should creat a file /tmp/out.cl. Zip that, and put it somewhere that I can take a look at it. Then run, using that file:

COCL_LOAD_KERNEL=1 ./cuda_sample

... it should display a message like 'loading kernel', and means it's using whatever is in /tmp/out.cl. Can you paste the full output of both commands to gist please?