naibaf7 / caffe

Caffe: a fast open framework for deep learning. With OpenCL and CUDA support.
http://caffe.berkeleyvision.org/
Other
85 stars 20 forks source link

_prod_TT fails with CL_OUT_OF_RESOURCE on Mali-T628 #60

Open psyhtest opened 7 years ago

psyhtest commented 7 years ago

After checking that Caffe no longer requests zero-sized OpenCL buffers (#59), I still encounter the same result on SqueezeNet 1.1:

      *** Aborted at 1486746813 (unix time) try "date -d @1486746813" if you are using GNU date ***
      PC: @ 0xb5c8798a mcl_entrypoints_valid_event_list
      *** SIGSEGV (@0x45ccc412) received by PID 29568 (TID 0xb01c6000) from PID 1171047442; stack trace: ***
          @ 0xb5774270 (unknown)

(GoogleNet behaves similarly to what I report for SqueezeNet, just doesn't segfault. AlexNet works fine.)

The first call that goes bad is:

  "call": "clEnqueueNDRangeKernel",
  "name": "_prod_TT",
  "queue": "0xd70c0",
  "kernel": "0x70f100",
  "gwo": [0, 0, 0],
  "gws": [16, 3200, 1],
  "lws": [8, 8, 1],
  "event_wait_list": [],
  "event": "0",
  "timestamp": {
   "start": "2017-02-10T16:55:05.482380",
   "end": "2017-02-10T16:55:05.523968"
  },
  "output profiling_error": -5,
  "profiling": {
   "queued": 532575944823,
   "submit": 532575944823,
   "start": 13079693137503322112,
   "end": 34359738377
  },
  "errcode": 0
 }

The local work size (8x8) should work for any kernel. I suspect the problem might be in the way the arguments are set, but don't have any other clue at the moment.

Please find attached a trace from dividiti's OpenCL profiler up to the failing call: naibaf7-caffe-60.dvdt-prof.txt. Please see #59 on how to reproduce.

tequilaguru commented 6 years ago

I’m finding the same issue on a Vivante GC2000, did you ever find a possible cause/solution?