Quantization failed when testing SSD - Githubissues

mathmanu / caffe-jacinto

This repository has moved. The new link can be obtained from https://github.com/TexasInstruments/jacinto-ai-devkit

116 stars 35 forks source link

Quantization failed when testing SSD #24

Open gasburner opened 5 years ago

gasburner commented 5 years ago

First,I tried the quantization function in mnist, and it works well. Then,I tried the quantization function on the SSD provided in caffe-jacinto-models and the original SSD. Both of them failed with error == cudaSuccess (7 vs. 0) too many resources requested for lanuch. Also I noticed that I could get the right result only if I set the iterations=1. (no matter what batch size i set or how many gpus i used.) And in the log provided in caffe-jacinto-models, I can't find the test_quantize part,which makes me quiet confused. So please tell me how to use the quantization function in SSD correctly. Thank you very much.

Here is my log.

I0910 14:36:28.498530 13312 common.cpp:475] GPU 0 'TITAN Xp' has compute capability 6.1 I0910 14:36:29.036025 13312 caffe.cpp:902] This is NVCaffe 0.17.0 started at Mon Sep 10 14:36:28 2018 I0910 14:36:29.036056 13312 caffe.cpp:904] CuDNN version: 7104 I0910 14:36:29.036072 13312 caffe.cpp:905] CuBLAS version: 9000 I0910 14:36:29.036077 13312 caffe.cpp:906] CUDA version: 9000 I0910 14:36:29.036082 13312 caffe.cpp:907] CUDA driver version: 9010 I0910 14:36:29.036089 13312 caffe.cpp:908] Arguments:

…………………………………………

I0910 14:36:48.912307 13312 net.cpp:2195] Enabling quantization at output of: Concat mbox_loc I0910 14:36:48.912477 13312 net.cpp:2195] Enabling quantization at output of: Concat mbox_conf I0910 14:36:48.912649 13312 net.cpp:2195] Enabling quantization at output of: Concat mbox_priorbox I0910 14:36:48.917215 13350 common.cpp:192] New stream 0x7fa3ac006960, device 0, thread 13350 F0910 14:36:48.941680 13312 permute_layer.cu:70] Check failed: error == cudaSuccess (7 vs. 0) too many resources requested for launch Check failure stack trace: @ 0x7fa4660295cd google::LogMessage::Fail() @ 0x7fa46602b433 google::LogMessage::SendToLog() @ 0x7fa46602915b google::LogMessage::Flush() @ 0x7fa46602be1e google::LogMessageFatal::~LogMessageFatal() @ 0x7fa467a7ce48 caffe::PermuteLayer<>::Forward_gpu() @ 0x7fa4673e8a7f caffe::Layer<>::Forward() @ 0x7fa4672561fe caffe::Net::ForwardFromTo() @ 0x7fa46725633d caffe::Net::Forward() @ 0x44cc4b test_detection() @ 0x4521f2 main @ 0x7fa4647ab830 __libc_start_main @ 0x449699 _start @ (nil) (unknown)

Thanks a lot!

mathmanu commented 5 years ago

Hi, I'll get back to you in a day.

mathmanu commented 5 years ago

test_quantize folder was not pused as Quantization for SSD was still being tested.

Note that there is a bugfix for quantization in caffe-jacinto repository - this will re-enable weight quantization for certain models (got disabled accidently). Please pull the latest changes.

However, I do not know the reason for the crash that you observed. I'll run the script and see if I get a crash. Can you tell me exactly which script did you run?

gasburner commented 5 years ago

Thanks！I will pull the latest changes and try again. I ran the train_image_object_detection.sh in caffe-jacinto-models and got crash in test_quantize pharse. For the original SSD,here's the prototxt and run.sh and run.log for quantization(voc07test).The weight file is trained by myself,I think can use the weights here http://www.cs.unc.edu/%7Ewliu/projects/SSD/models_VGGNet_VOC0712_SSD_300x300.tar.gz to see if will get a crash. test_quantize.zip

mathmanu commented 5 years ago

Can you run train_image_object_detection.sh without any change (use the network that uses by default) and let me know if you get a crash? You can reduce max_iter values int the script to perform a quick training.

mathmanu commented 5 years ago

I tried a quick training and it worked for me with the default network. Can you pin point exactly what change causes the crash. You seem to have used a different network from what the script uses by default.

gasburner commented 5 years ago

Sorry for my late reply. In fact I think I use the default train_image_object_detection.sh.What i said above means that I tried 2 settings and both got the same crash,including the default one. Prudently,i reclone the caffe-jacinto and caffe-jacinto-model to make sure I make no change,and it works. So I tried the test_quantize on original SSD with the recloned caffe-jacinto,and it works too. I checkout the makefile to make sure there is no difference. So maybe the latest changes solve my problem? In all,thank you very much!

mathmanu commented 5 years ago

I forgot to mention one thing. By default, when you set the quantize flag, the priorbox concat layer is quantized, just like any other concat layer. This causes siginificant drop in accuracy. The prior box parameters should not be quantized. You can do that by providing the layer name in the ignored layer names.

quantize: true net_quantization_param { ignored_layer_names: ["mbox_priorbox"] }

You can also specify / try more parameters but these are are really not needed to be specified. quantize: true net_quantization_param { quantization_start: 1 power2_scale_weights: false power2_scale_activations: false bitwidth_activations: 8 bitwidth_weights: 8 #12 bitwidth_bias: 16 apply_offset_activations: false apply_offset_weights: true range_update_factor: 0.10 range_expansion_factor: 1.1 ignored_layer_names: ["mbox_priorbox"] }