mathmanu / caffe-jacinto

This repository has moved. The new link can be obtained from https://github.com/TexasInstruments/jacinto-ai-devkit
116 stars 35 forks source link

Mobile SSD #22

Open markvex opened 6 years ago

markvex commented 6 years ago

Hi, i am trying to train mobile SSD from https://github.com/chuanqi305/MobileNet-SSD. when trying to train only with quantization without sparse i dont see convergence: loss reaches 5.6 and stagnates. the same phenomena happens when trying to train both from scratch and fine tunning. I enabled global quantization and only in the first layer conv 0 specifically. after running only one iteration of training i see that the weights of conv0 of the pre-trained network became 0. few questions: 1.did you try mobile SSD?

  1. Do you have some guidelines of how to train this netwrok?
  2. why are the weights become 0 after first iteration?

thank you

mathmanu commented 6 years ago

We have an example script. Please read the documentation (very minimal) and try it out:

https://github.com/tidsp/caffe-jacinto-models https://github.com/tidsp/caffe-jacinto-models/blob/caffe-0.17/docs/MobileNet_ObjectDetect_README.md

Best regards,

markvex commented 6 years ago

few comments:

  1. quantization is disabled by default
  2. the prototxt that you are using is different than https://github.com/chuanqi305/MobileNet-SSD

in any case (with your prototxt and with the https://github.com/chuanqi305/MobileNet-SSD) when enabling quantization for the first layer loss stagnates at ~5.5. BTW iteration takes about 19 seconds in GPU 1080.

mathmanu commented 6 years ago

I don't use quantization during training, but only during test/inference. I call this on-the-fly quantization (vs trained quantization). The accuracy with on-the-fly quantization is seen to be good in most cases. There is some drop for MobileNet (not very large), and we have some ideas to fix it.

Please see this thread for more details (a bit outdated): https://github.com/tidsp/caffe-jacinto/issues/1

markvex commented 6 years ago

ok, so just to make sure i understood correctly- this code you uploaded is not meant for training with quantization but rather use quantization during inference?

mathmanu commented 6 years ago

The code for training with quantization may be broken as I have not used it for a long time. Please try inference time quantization and see if it fits your purpose.

if you use bitwidth_weights = 10, bitwidth_activations = 8, then inference time quantization works perfectly for all networks including mobilenets.

But if you use bitwidth_weights = 8, bitwidth_activations = 8, then there may be a slight accuracy drop for mobilenet - and depending on your application, this may be fine.

Best regards, Manu.

markvex commented 6 years ago

Ok , thanks a lot for your quick response. i was trying to fine tuning with quantization mobile SSD with only 2 classes and the loss stagnated at about ~5.5. I tried different lr and optimizations but no results. can i compare somehow between the weights before quantization and after?

markvex commented 6 years ago

another question: if i set quantize to true in the training prototxt, the flag for each layer is false be default, so theoretically i have to set the flag for each layer no?

mathmanu commented 6 years ago

For dumping out the weights, you may have to write some code. The quantize flag is propagated to each layer, so no need to set for each layer.

I think most people who do training with quantiation do the backpropagation update on full precision weights. And then quantize only for the purpose for forward. If there is batch normalization in the network, then there are additional set of complications as the weights used for inference is a combination of weights in the convolution layer and the batch norm. I have not taken care of these aspects for training time quantization in this repository. So summary is that doing training time quantization right needs quite some work.

So using this repository, your best bet is to do inference time quantization. Fold batch norm layers into convolution layers before doing inference with quantation using the optimize option provided for caffe. Alternately, set the option --optimize_net 1 during test.

On how to do training with quantization, you can refer the following paper: https://arxiv.org/abs/1712.05877

On designing quantization friendly MobileNet, you can refer to this: https://arxiv.org/abs/1803.08607

markvex commented 6 years ago

thanks for the articles, one last question: when i ran fine tuning with my network and set the general flag quantize in the train.prototxt , it looks like training continues as usual, lose is still good. when i add also the flag quantize to the first conv layer (conv 0) loss increases immediately , then goes down and stagnate at high value. since you said that setting the general flag is enough and i get good lose value, how can i assure that quantization really had happened? for example i see a value of min -1.38, max 2.03 but scale is above 1000 and so the offset , while i expect the scale (step) to be something very small.

mathmanu commented 6 years ago

That flag is only propagated from the iteration quantization_start onwards. Default value of this is 2000. So you will see the effect of quantization from iteration 2000 onwards. You can change quantization_start to 0 if you want it to take effect from the beginning onwards. See the function StartQuantization() in net.cpp for more details.

I am surprised you mentioned that your loss is around 5.5. Whenever I tried training time quantization, I always applied while fine tuning on a fully trained network. So the loss should be much less at that stage.

markvex commented 6 years ago

actually from caffe.proto: //frame/iter at which quantization is introduced
optional int32 quantization_start = 1 [default = 1];

my loss was around 1.24 before quantization, then i started training with quantization- if only the global flag was set i got loss 1.24 and even less, when i set also the flag for the first layer i got loss of 10 that decreased eventually to 5.5 and stagnated. since the default value for quantization for each layer is false, and since lose has changed only when i added quantization for conv0 i was not sure that when setting the general flag only quantization really happens.