naibaf7 / caffe

Caffe: a fast open framework for deep learning. With OpenCL and CUDA support.
http://caffe.berkeleyvision.org/
Other
85 stars 20 forks source link

Calls to create zero-sized buffers #59

Closed psyhtest closed 7 years ago

psyhtest commented 7 years ago

I'm investigating some failures apparently caused by OpenCL API calls to create zero-sized buffers when benchmarking with caffe time. I've seen such failures for a while, but it's only now I am beginning to get some sense of what's going on with help of dividiti's OpenCL profiler.

For example, on an Ubuntu 16.04 workstation with the NVIDIA GTX 1080 GPU, the benchmark completes with:

I0207 11:49:33.122478 14051 caffe.cpp:476] *** Benchmark ends ***
*** Aborted at 1486468173 (unix time) try "date -d @1486468173" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGSEGV (@0xf00000000) received by PID 14051 (TID 0x7fd257582ac0) from PID 0; stack trace: ***
    @     0x7fd2557094b0 (unknown)
    @     0x7fd25603f105 clFinish
    @     0x7fd256a0ff7a caffe::SyncedMemory::~SyncedMemory()
    @     0x7fd256a00b02 boost::detail::sp_counted_impl_p<>::dispose()
    @     0x7fd2569ae91a boost::detail::sp_counted_base::release()
    @     0x7fd2569bc542 boost::detail::sp_counted_impl_p<>::dispose()
    @     0x7fd25699a4b9 boost::detail::sp_counted_impl_p<>::dispose()
    @     0x7fd2569976a8 std::vector<>::~vector()
    @     0x7fd25570e36a __cxa_finalize
    @     0x7fd256991573 (unknown)

On a development platform with the ARM Mali-T628 GPU, the benchmark may not even complete with:

     *** Aborted at 1486392479 (unix time) try "date -d @1486392479" if you are using GNU date ***
      PC: @ 0xb52c21a4 mcl_entrypoints_valid_event_list_common
      *** SIGSEGV (@0x64321765) received by PID 30077 (TID 0xb6f7e000) from PID 1681004389; stack trace: ***
          @ 0xb4a2a3e0 (unknown)

along the way.

Such calls seem to happen across BLAS libraries and models, but to get the ball rolling let's consider ViennaCL with AlexNet.

The easiest way to reproduce is via CK-Caffe:

  1. Install CK:

    $ sudo pip install ck
    $ ck version
  2. Install CK-Caffe:

    $ ck pull repo:ck-caffe --url=https://github.com/dividiti/ck-caffe
  3. Install Caffe with ViennaCL:

    $ ck install package:lib-caffe-bvlc-opencl-viennacl-universal
    • Select GCC if prompted for C/C++ compiler.
  4. Run Caffe benchmarking under dividiti's OpenCL profiler:

    $ ck pipeline program:caffe --env.CK_CAFFE_BATCH_SIZE=1 --dvdt_prof
    • Select command: “time_gpu”
    • Select library (if prompted): “BVLC Caffe framework (opencl,viennacl)
    • Select model (if prompted): “caffemodel-bvlc-alexnet-fast-mirror”
    • Select profiler: “tool-dvdt-prof-cjson”
  5. Inspect the profiler's output:

    $ cd `ck find program:caffe`/tmp
    $ less -r tmp-dvdt-prof.json

    The output is a list of dictionaries describing traced OpenCL API calls. At 5-6% of the total trace, there's a call to clCreateBuffer() with size=0, which returns errcode=-61 (CL_INVALID_BUFFER_SIZE) as per the specification.

    { 
    "buffer": "0",
    "timestamp": {
      "start": "2017-02-07T11:49:32.855578",
      "end": "2017-02-07T11:49:32.855581"
    }, 
    "errcode": -61,
    "flags": 1, 
    "context": "0x77c5f0",
    "host_ptr": "0", 
    "call": "clCreateBuffer", 
    "errcode_ret": "0x7ffc1705f8f4",
    "size": 0
    },

    On the Mali-powered platform things go downhill from there causing failures in clEnqueueNDRangeKernel() and even segfaults in clEnqueueWriteBuffer()...

Please find attached a sample CK-Caffe session and profiler trace from the GTX 1080 powered workstation:

naibaf7 commented 7 years ago

Oh, interesting. Yes I noted such errors before. Do you already know what piece of code tries to allocate size 0 buffers?

psyhtest commented 7 years ago

Do you already know what piece of code tries to allocate size 0 buffers?

No, I was hoping you would have a better idea :).

Please take a look at the sample trace when you have time.

naibaf7 commented 7 years ago

Hm alright, I suspect it might be a stupid code added in the early days, possibly greentea.cpp. That'll have to be cleaned up anyways. https://github.com/naibaf7/caffe/blob/master/src/caffe/greentea/greentea.cpp

naibaf7 commented 7 years ago

So my current hypothesis is that this "workaround" is there for places where it is OK to pass nullptr into the CUDA kernel, but we can't pass a nullptr-cl_mem object into the kernel (CL_INVALID_MEM_OBJECT), so the workaround was to allocate a temporary buffer of size 0. But apparently some OpenCL SDKs are also not OK with that and will fail either early (on allocation) or later on. As I see it the problem exists only with the pooling layers, and I'll look for another workaround now.

naibaf7 commented 7 years ago

Should hopefully be resolved with the latest commits (+ a few other bugfixes). Tested runtests on AMD and nVidia Linux OpenCL implementation.

psyhtest commented 7 years ago

@naibaf7

On the GTX 1080, the same benchmark now runs with errcode=0 for all the OpenCL API calls. It still segfaults though, but I believe it's unrelated:

I0209 11:41:03.430191 47885 caffe.cpp:475] Total Time: 144.459 ms.
I0209 11:41:03.430194 47885 caffe.cpp:476] *** Benchmark ends ***
*** Aborted at 1486640463 (unix time) try "date -d @1486640463" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGSEGV (@0x1100) received by PID 47885 (TID 0x7f45d4895ac0) from PID 4352; stack trace: ***
    @     0x7f45d2a1c4b0 (unknown)
    @     0x7f45d3d22f72 caffe::SyncedMemory::~SyncedMemory()
    @     0x7f45d3d13b02 boost::detail::sp_counted_impl_p<>::dispose()
    @     0x7f45d3cc191a boost::detail::sp_counted_base::release()
    @     0x7f45d3ccf542 boost::detail::sp_counted_impl_p<>::dispose()
    @     0x7f45d3cad4b9 boost::detail::sp_counted_impl_p<>::dispose()
    @     0x7f45d3caa6a8 std::vector<>::~vector()
    @     0x7f45d2a2136a __cxa_finalize
    @     0x7f45d3ca4573 (unknown)

I'll test on the Mali-T628 next... Just need to remember to build using:

$  ck install package:lib-caffe-bvlc-opencl-viennacl-universal \
  --env.DISABLE_DOUBLE_SUPPORT=ON \
  --env.DISABLE_DEVICE_HOST_UNIFIED_MEMORY=ON \
  --env.CK_HOST_CPU_NUMBER_OF_PROCESSORS=2
psyhtest commented 7 years ago

I've checked on the Odroid XU3 platform, and this problem has gone away. I'm struggling with another issue now, but I'll open a new ticket for that. Thanks.