tensorflow / models

Models and examples built with TensorFlow
Other
77.05k stars 45.77k forks source link

too slow when Quantization-aware training ssd mobiletnet v2 #6909

Open roadcode opened 5 years ago

roadcode commented 5 years ago

use ssdlite_mobilenet_v2_coco.config and modify it by adding graph_rewriter

graph_rewriter { quantization { delay: 0 weight_bits: 8 activation_bits: 8 } } but the time of each step is 10x slower than the config without graph_rewriter

the log without graph_rewriter

INFO:tensorflow:global step 199709: loss = 1.4051 (0.745 sec/step) INFO:tensorflow:global step 199710: loss = 1.5033 (0.564 sec/step) INFO:tensorflow:global step 199710: loss = 1.5033 (0.564 sec/step) INFO:tensorflow:global step 199711: loss = 1.7374 (1.093 sec/step) INFO:tensorflow:global step 199711: loss = 1.7374 (1.093 sec/step) INFO:tensorflow:global step 199712: loss = 1.6265 (0.812 sec/step)

the log with graph_rewiter

INFO:tensorflow:global step 4554: loss = 9.3010 (4.084 sec/step) INFO:tensorflow:global step 4554: loss = 9.3010 (4.084 sec/step) INFO:tensorflow:global step 4555: loss = 8.2835 (4.055 sec/step) INFO:tensorflow:global step 4555: loss = 8.2835 (4.055 sec/step) INFO:tensorflow:global step 4556: loss = 8.0293 (4.060 sec/step) INFO:tensorflow:global step 4556: loss = 8.0293 (4.060 sec/step)

the tensorflow-gpu version is 1.12. is the speed normal, any idea for this?

tensorflowbutler commented 5 years ago

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks. What is the top-level directory of the model you are using Have I written custom code OS Platform and Distribution TensorFlow installed from TensorFlow version Bazel version CUDA/cuDNN version GPU model and memory Exact command to reproduce

roadcode commented 5 years ago

during multi-gpu Quantization-aware training, i check the the saved ckpt model, and find that when using create_training_graph, the node became

clone_1/FeatureExtractor/MobilenetV2/expanded_conv_6/expand/act_quant/clone_1/FeatureExtractor/MobilenetV2/expanded_conv_6/expand/act_quant/max/biased

when using create_eval_graph, actually the node is

FeatureExtractor/MobilenetV2/expanded_conv_6/expand/act_quant/max/biased

so i have to rewrite the node to save the frozen quantization pb, is there something wrong?

roadcode commented 5 years ago

4783

hvico commented 5 years ago

I'm seeing the same behavior, aprox 0.125 secs/step with ssdlite_mobilenet_v2_coco float, and 0.8 secs/steps training the quantized version of the same model.

Info:

What is the top-level directory of the model you are using ssd_mobilenet_v2_quantized_300x300_coco_2018_09_14

Have I written custom code No

OS Platform and Distribution Ubuntu 18.04 64-bit Intel

TensorFlow installed from Sources

TensorFlow version 1.12

Bazel version 0.17.2

CUDA/cuDNN version 10.0

GPU model and memory GTX2080TI with 11 GB

Exact command to reproduce python3 /content/models/research/object_detection/legacy/train.py --logtostderr --train_dir={model_dir} --pipeline_config_path={pipeline_fname}

roadcode commented 5 years ago

@hvico have you fix this problem?

hvico commented 5 years ago

@hvico have you fix this problem?

Nop, I haven't. It trains much slower than the float model but it works.

alanchiao commented 5 years ago

Quantization aware training is much slower in general for vision models, since internally we have to do two Conv operations for every Conv in the float model. 10x slower seems a bit extreme though - looping in Suharsh to comment

git-hamza commented 4 years ago

I am also facing the same problem.Quantization aware training for ssd_mobilenet_v1_coco is upto 10x slower than without the quantization(by not adding this line "graph_rewriter { quantization { delay: 0 weight_bits: 8 activation_bits: 8 } }"). Is it because i am doing it wrong or it in not yet fixed by tensor flow?

skulhare commented 4 years ago

I am facing the same problem. I am more concerned about inference. For inference, I converted the model (SSD ResNet 50) to TFLite format. A TFLite converted quantized model is 3 times slower than TFLite converted non-quantized model. This is the exact opposite of what I was expecting from quantization. Thanks.