naibaf7 / libdnn

Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL
Other
135 stars 35 forks source link

Work with style-transfer #19

Closed janchk closed 7 years ago

janchk commented 7 years ago

Hello again. Faced with problem using libDNN instead of caffe engine when trying to use https://github.com/fzliu/style-transfer this thing.

With caffe engine I got this. (as planned)

sanfrancisco-starry_night-vgg19-content-1e4-512 1 and make runtest log

log_runtest_caffeengn.txt

With libDNN I got this. (strange)

sanfrancisco-starry_night-vgg19-content-1e4-512 2

and make runtest

log_runtest_libDNN.txt

Have no idea what the reason for this glitch.

naibaf7 commented 7 years ago

@janchk Oh sorry, this issue should be reported here instead: https://github.com/naibaf7/caffe since the standalone LibDNN is not the same/affected. But ok now that it's here let's leave it that way. Just so you know for the next time.

I see that you use VGG-19 model which contains average-pooling.

layer {
  bottom: "conv4_4"
  top: "pool4"
  name: "pool4"
  type: "Pooling"
  pooling_param {
    pool: AVE
    kernel_size: 2
    stride: 2
  }
}

I just fixed average-pooling in OpenCL-Caffe LibDNN yesterday, so if you try again today with updating & recompiling it should work fine :) You can confirm this by running the "runtest" again and the average pooling error should disappear. LibDNN is quite new, especially the pooling feature, so some bugs can occur. However convolutions, average and max pooling are strictly unit-tested against Caffe's engine now (100 random configuration samples per runtest).

Related-Commit: https://github.com/naibaf7/caffe/commit/6ab003a2ae7fb53d1db777d7cafd6ff8fd3e9c93

naibaf7 commented 7 years ago

This may also be related: https://github.com/naibaf7/caffe/commit/2f4c6b6bc426107dd2458fe1c16daa2329c012d7

naibaf7 commented 7 years ago

@bhack See this is why those features are not yet in the standalone LibDNN, you will only get them after they're thoroughly tested against bugs in the Caffe environment :) Should be ready soon!

bhack commented 7 years ago

@naibaf7 Having not in sync source of objects with the same name (libdnn) IMHO don't help so much. But I cannot find any quick solution that could let Caffe to use the standalone version as upstream cause actually responsability separation require code duplication for device setup and program handling.

naibaf7 commented 7 years ago

@janchk OK there seems to be one more issue not captured by unit tests so far. I'll have to take another look.

naibaf7 commented 7 years ago

@janchk OK it seems like the style transfer needs to reshape the network dimensions. This wasn't accounted for in LibDNN layers yet, so I had to add functionality that recompiles the LibDNN kernel if, during a call to Reshape, the shapes really change: https://github.com/naibaf7/caffe/commit/c3a9f277fe526bc3bdb43c921946086049e72a35 This most certainly fixes it, but I didn't test it myself yet. Would be good if you can report back the performance numbers (how long it took) and if it looks correct now.

janchk commented 7 years ago

@naibaf7 screenshot from 2016-12-03 16-58-12 Two times longer than the previous version. (Does not finished yet) (can't check if it is correct) This is my make all log. If it matter. caffe_libdnn_mkall.txt

naibaf7 commented 7 years ago

@janchk It makes sense that it takes longer, because the kernels are now actually compiled for the reshaped sizes. Before it computed the original VGG19 size, which was smaller than the image. I tested it myself now and had 22 minutes on a W9100/R9 290X (LibDNN) and 12 minutes on a GTX 1080 (cuDNN). I see your GPU is only clocked at 400 MHz. Is it possible to set the clock to the full 825 MHz? Or is this fluctuating? Otherwise, it seems right, given this GPU has 0.6 TFLOP. (W9100: about 5 TFLOP, GTX 1080: 10 TFLOP).

janchk commented 7 years ago

@naibaf7 No fluctuating. Stuck on 400MHZ on load. I have no experience with overclocking mobile GPU (on Linux especially) It seems to be unstable with portable cooling. By the way it's 8750M. Anyway after your commit it works as expected. Still better than the original one. Thank you very much! Good luck with upcoming development!

naibaf7 commented 7 years ago

@janchk Ok cool, so I'll close this case. Yes the 8750M has 0.6 TFLOPs of performance. It's pretty decent for a notebook card, but obviously not the fastest if you are interested in deep learning. However probably still 3-5 times faster than the laptop CPU...