Another failed pytest with keras

inferrna commented 8 years ago

/usr/lib/python3/dist-packages/logilab/common/decorators.py:40: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  if len(getargspec(callableobj).args) == 1 or self.keyarg == 0:
going into keras/tests
=======================  test_loss_masking.py  =======================
Using TensorFlow backend.

======================  test_loss_weighting.py  ======================
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
Using OpenCL device: Pitcairn
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties: 
name: Pitcairn
major: -1 minor: -1 memoryClockRate (GHz) 1000
pciBusID 0000.0000
Total memory: 1.97GiB
Free memory: 1.31GiB
W tensorflow/stream_executor/cl/cl_driver.cc:587] creating context when one is currently active; existing: 0�p
Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
Using OpenCL device: Pitcairn
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 1 with properties: 
name: Pitcairn
major: -1 minor: -1 memoryClockRate (GHz) 1000
pciBusID 0000.0000
Total memory: 1.97GiB
Free memory: 1.31GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 1 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0 1 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0:   N N 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 1:   N N 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Pitcairn, pci bus id: 0000.0000)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Pitcairn, pci bus id: 0000.0000)
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
cl_driver DeviceAllocate 1192542208
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
cl_driver DeviceAllocate 1192542208
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
building kernel _ZN5Eigen8internal15EigenMetaKernelINS_15TensorEvaluatorIKNS_14TensorAssignOpINS_9TensorMapINS_6TensorIfLi1ELi1EiEELi16ENS_11MakePointerEEEKNS_18TensorCwiseUnaryOpINS0_12scalar_rightIffNS0_17scalar_product_opIffEEEEKNS4_INS5_IKfLi1ELi1EiEELi16ES7_EEEEEENS_9GpuDeviceEEEiEEvT_T0_
Segmentation fault (core dumped)

You have to add support for some debug parameters/env-vars (if still doesn't). For example, if I had access to failed kernel source I would try to compile it with CodeXL and compose more reliable bugreport.

hughperkins commented 8 years ago

You have to add support for some debug parameters/env-vars (if still doesn't). For example, if I had access to failed kernel source I would try to compile it with CodeXL and compose more reliable bugreport.

Yes, alright, good idea. What is your preferred interface for this? (eg, create an env var? Preferred var name etc?)

hughperkins commented 8 years ago

(thinking about this though, segfault is probably not during opencl build. bugs in kernel build will cause the sourcecode to be dumped into /tmp/failed-kernel.cl This is already the case)

hughperkins commented 8 years ago

(though, I dont know, so let's try getting the opencl sourcecode dumped, from env var, and we can see)

hughperkins commented 8 years ago

For example we could have:

(no vars) => current behavior
COCL_DUMP_KERNEL=/tmp/foo.cl => dumps cl sourcecode to /tmp/foo.cl
COCL_LOAD_KERNEL=/tmp/foo.cl => loads/compiles source code from /tmp/foo.cl, rather than using the current sourcecode

Note that in the general case there might be a bunch of different kernels/sourcecodes being dumped. So we might want to include the kernel name, or the sourcefile name, in that somehow/somewhere.

inferrna commented 8 years ago

COCL_DUMP_KERNEL=/tmp/foo.cl - good variant, at least we can get source of the last failed kernel. I thinking about passing directory for loading fixed kernels, like this COCL_LOAD_KERNEL_FROM=/tmp/kernels/ - where /tmp/kernels/ can contain kernels and other functions in separate files. For example, if code find /tmp/kernels/function_or_kernel_name.cl - it would overwrite own function/kernel variant with its content.

hughperkins commented 8 years ago

k, sounds good. What I'm going to do:

I have a wheel that loads libcocl.so as an so, means I can hack around with libcocl.so without having to rebuild entire tensorflow wheel...
I'm going to modify libcocl.so, to do add the debuggin options above
you'll need to download the .so wheel (actually, it's here: http://52.205.112.80/tf/shared/tensorflow-0.11.0rc0-py3-none-any.whl , so you can already download it if you want)
and then you'll need to replace the libcocl.so inside it with a new one
- libcocl.so is at: (your virtualenv)/lib/python3.5/site-packages/tensorflow/third_party/cuda-on-cl/libcocl.so (I mean, on your machine; I havent actually create the modified version for download yet) (Note that if you decide to try modifying libcocl.so in the meantime, you'll need the latest version, from latest cuda-on-cl master branch, and make sure you rebuild evrything in cuda-on-cl, such that the following commands all show some output:

objdump -x -a build/libcocl.so | grep RPATH
objdump -x -a build/libeasycl.so | grep RPATH
objdump -x -a build/clblast/libclblast.so | grep RPATH

)

inferrna commented 8 years ago

I'd prefer to build it from source. Maybe it is a good point to start to use cmake? There is at least 2 parameters need to be configured: DEBUG=ON/OFF USE_INTERNAL_CLBLAST=ON/OFF (I use clblast as a separate package for caffe)

hughperkins commented 8 years ago

I'd prefer to build it from source.

Ok. If you're up from buliding from source, you could hack the file direclty perhaps. Basically, in src/hostside_opencl_funcs.cpp, scroll down to the method getKernelForName, which does:

checks if we already built th ekernel, if so returns it, from the kernelByname map
otherwise, builds the kernel
just before the line CLKernel *kernel = cl->buildKernelFromString(sourcecode, name, "", "__internal__");, you can paste the following code block:

        cout << "load env " << (void *)getenv("LOAD") << endl;
        bool load = getenv("LOAD") != 0;
        string filename = "/tmp/out.cl";
        if(load) {
            cout << "loading kernel" << endl;
            ifstream f;
            f.open(filename, ios_base::in);
            // f << launchConfiguration.kernelName << endl;
            // f >> sourcecode;
            sourcecode = "";
            string line = "";
            while(getline(f, line)) {
                sourcecode += line + "\n";
            }
            // cout << sourcecode << endl;
            f.close();
        } else {
            cout << "saving kernel" << endl;
             ofstream f;
           f.open(filename, ios_base::out);
            // f << launchConfiguration.kernelName << endl;
            f << sourcecode << endl;
            f.close();
        }

(I coded this before we discussed above; I was already using this code, just was setting the bool load variable by hand. It sort of kind of works, and then we can kind of bash it into doing what you want. Currently:

by default it will write every kernel's sourcecode into /tmp/out.cl,
unless you define eg LOAD=1, in whichc case it will load the sourcecode from this file instead, and compile it, run it

hughperkins commented 8 years ago

Maybe it is a good point to start to use cmake?

I'm up for using cmake. I like cmake. The only gentle concern I have with cmake is that it makes it hard to build using anything other than gcc, on linux. And currently everything ecxept linking is using clang. But thinking it through:

for compiling libcocl.so, ir-to-opencl and patch-hostside, it probably doesnt matter if they're built with gcc or clang. I'm not quite sure :-) We could try
the bits inside cocl.Makefile definitely must be built using clang, .. .but thats a different makefile, different than the Makefile, in the root of cuda-on-cl source repository.

So... I think that we probably can try migrating the top-level Makefile to cmake, and see what happens.

hughperkins commented 8 years ago

What are your thougths on who will migrate to CMakeLists.txt? I can take a look if you want?

hughperkins commented 8 years ago

Well... I coudlnt put up with manky Makefile dependencies any more, and added a CMaekLists.txt in https://github.com/hughperkins/cuda-on-cl/commit/22c16dac81bde2f74e22f80c7679c708c5a3833b

hughperkins commented 8 years ago

(CMakeLists.txt Looking pretty clean in latest version; though latest version breaks tensorflow build, so ... :-P )

inferrna commented 8 years ago

File missed or wrong declaration:

CMake Error at CMakeLists.txt:115 (add_library):
  Cannot find source file:

    src/CLBlast/src/database/database.cpp

  Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm .hpp
  .hxx .in .txx

CMake Error: CMake can not determine linker language for target: clblast
CMake Error: Cannot determine link language for target "clblast".
CMake Error in CMakeLists.txt:
  Exporting the target "clblast" is not allowed since its linker language
  cannot be determined

I have manually pulled CLBlast and other submodules before doing cmake.

hughperkins commented 8 years ago

Hi inferrna,

Can you provide the output of git submodule please?

$ git submodule
 c09e59fae751bfa912886aea6000f41b301665a0 src/CLBlast (c09e59f)
 5dee35e0853d6cfc57f921586dd96968a9e602c3 src/EasyCL (v1.3-246-g5dee35e)

Also, can you provide hte output of git status pelase?

git status

inferrna commented 8 years ago

$ git submodule
+d190becd89d4747d2cfd5d77e821d4e84ad01941 ../src/CLBlast (0.6.0)
+2fbae1f9e12ca4d85079bfba7a3a0fffed03ec7d ../src/EasyCL (v1.3-249-g2fbae1f)
$ git status 
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

    modified:   ../CMakeLists.txt

difference (with my cmake changes - just added add_subdirectory(src/CLBlast) and removed some stuff)

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 10f1d0f..cc80550 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -89,30 +89,31 @@ set(CLBLAST_ROUTINES ${CLBLAST_LEVEL1_ROUTINES} ${CLBLAST_LEVEL2_ROUTINES} ${CLB
 set(CLBLAST_PRECISIONS 32 64 3232 6464 16)

 # Gathers all source-files
-set(CLBLAST_SOURCES
-  src/CLBlast/src/database/database.cpp
-  src/CLBlast/src/routines/common.cpp
-  src/CLBlast/src/cache.cpp
-  src/CLBlast/src/clblast.cpp
-  src/CLBlast/src/clblast_c.cpp
-  src/CLBlast/src/routine.cpp
-  src/CLBlast/src/utilities.cpp
-)
-foreach(ROUTINE ${CLBLAST_LEVEL1_ROUTINES})
-  set(CLBLAST_SOURCES ${CLBLAST_SOURCES} src/CLBlast/src/routines/level1/${ROUTINE}.cpp)
-endforeach()
-foreach(ROUTINE ${CLBLAST_LEVEL2_ROUTINES})
-  set(CLBLAST_SOURCES ${CLBLAST_SOURCES} src/CLBlast/src/routines/level2/${ROUTINE}.cpp)
-endforeach()
-foreach(ROUTINE ${CLBLAST_LEVEL3_ROUTINES})
-  set(CLBLAST_SOURCES ${CLBLAST_SOURCES} src/CLBlast/src/routines/level3/${ROUTINE}.cpp)
-endforeach()
-foreach(ROUTINE ${CLBLAST_LEVELX_ROUTINES})
-  set(CLBLAST_SOURCES ${CLBLAST_SOURCES} src/CLBlast/src/routines/levelx/${ROUTINE}.cpp)
-endforeach()
-
-include_directories(src/CLBlast/src)
-add_library(clblast SHARED ${CLBLAST_SOURCES})
+add_subdirectory(src/CLBlast)
+#set(CLBLAST_SOURCES
+#  src/CLBlast/src/database.cc
+#  src/CLBlast/src/routines/common.cpp
+#  src/CLBlast/src/cache.cpp
+#  src/CLBlast/src/clblast.cpp
+#  src/CLBlast/src/clblast_c.cpp
+#  src/CLBlast/src/routine.cpp
+#  src/CLBlast/src/utilities.cpp
+#)
+#foreach(ROUTINE ${CLBLAST_LEVEL1_ROUTINES})
+#  set(CLBLAST_SOURCES ${CLBLAST_SOURCES} src/CLBlast/src/routines/level1/${ROUTINE}.cpp)
+#endforeach()
+#foreach(ROUTINE ${CLBLAST_LEVEL2_ROUTINES})
+#  set(CLBLAST_SOURCES ${CLBLAST_SOURCES} src/CLBlast/src/routines/level2/${ROUTINE}.cpp)
+#endforeach()
+#foreach(ROUTINE ${CLBLAST_LEVEL3_ROUTINES})
+#  set(CLBLAST_SOURCES ${CLBLAST_SOURCES} src/CLBlast/src/routines/level3/${ROUTINE}.cpp)
+#endforeach()
+#foreach(ROUTINE ${CLBLAST_LEVELX_ROUTINES})
+#  set(CLBLAST_SOURCES ${CLBLAST_SOURCES} src/CLBlast/src/routines/levelx/${ROUTINE}.cpp)
+#endforeach()
+#
+#include_directories(src/CLBlast/src)
+#add_library(clblast SHARED ${CLBLAST_SOURCES})

 target_include_directories(clblast PRIVATE src/CLBlast/include)

@@ -156,7 +157,7 @@ INSTALL(FILES ${CLEW_HEADERS} DESTINATION include)
 INSTALL(FILES ${EASYCL_HEADERS_ROOT} DESTINATION include/EasyCL)
 # INSTALL(FILES ${CMAKE_SOURCE_DIR}/cmake/cocl.cmake DESTINATION share/cocl)
 INSTALL(FILES ${CMAKE_BINARY_DIR}/cmake/cocl.cmake DESTINATION share/cocl)
-install(TARGETS easycl clew clblast cocl ir-to-opencl patch-hostside EXPORT cocl-targets
+install(TARGETS easycl clew cocl ir-to-opencl patch-hostside EXPORT cocl-targets
     LIBRARY DESTINATION lib
     ARCHIVE DESTINATION lib
     RUNTIME DESTINATION bin
diff --git a/src/CLBlast b/src/CLBlast
index c09e59f..d190bec 160000
--- a/src/CLBlast
+++ b/src/CLBlast
@@ -1 +1 @@
-Subproject commit c09e59fae751bfa912886aea6000f41b301665a0
+Subproject commit d190becd89d4747d2cfd5d77e821d4e84ad01941
diff --git a/src/EasyCL b/src/EasyCL
index 5dee35e..2fbae1f 160000
--- a/src/EasyCL
+++ b/src/EasyCL
@@ -1 +1 @@
-Subproject commit 5dee35e0853d6cfc57f921586dd96968a9e602c3
+Subproject commit 2fbae1f9e12ca4d85079bfba7a3a0fffed03ec7d

Also cmake has own technique to use submodules - just for example. https://coderwall.com/p/y3zzbq/use-cmake-enabled-libraries-in-your-cmake-project

hughperkins commented 8 years ago

Also cmake has own technique to use submodules - just for example. https://coderwall.com/p/y3zzbq/use-cmake-enabled-libraries-in-your-cmake-project

it does. And I use it for example in DeepCL https://github.com/hughperkins/DeepCL/blob/master/CMakeLists.txt#L66-L74

Kind of fiddly to get working though. Much easier to just build it, without going the whole cmake submodules route... as you can see :-)

Question: what is your use-case that you are trying to solve by replacing the existing clblast build in the cuda-on-cl cmakelists.txt makefile?

inferrna commented 8 years ago

After execute tensorflow-cl$ git submodule update --init --recursive trying to update cuda-on-cl

tensorflow-cl/third_party/cuda-on-cl$ git pull
You are not currently on a branch.
Please specify which branch you want to merge with.
See git-pull(1) for details.

    git pull <remote> <branch>

tensorflow-cl/third_party/cuda-on-cl$ git status
HEAD detached at d6c2c7a

there seems like `git submodule update --init --recursive' switches all submodules to concrete commit each time. To fix this I executed

git submodule foreach --recursive git checkout master
git submodule foreach --recursive git pull

in tensorflow-cl/third_party/cuda-on-cl and tensorflow-cl. And after I just fixed CMakeLists.txt to make it working with updated CLBlasts own CMakeLists.txt because it sources were much changed.

hughperkins commented 8 years ago

Ok. Because you want the latest version of CLBlast? I dont support that for now though. You will need to figure out how to get that working on your own :-)

hughperkins commented 8 years ago

Closing since much of this is done, and much of the rest is out of date. Lets create some new issues for the latest v0.13.0 code.

hughperkins / tf-coriander

Another failed pytest with keras #3