Open roadcode opened 5 years ago
Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks. What is the top-level directory of the model you are using Have I written custom code OS Platform and Distribution TensorFlow installed from TensorFlow version Bazel version CUDA/cuDNN version GPU model and memory Exact command to reproduce
during multi-gpu Quantization-aware training, i check the the saved ckpt model, and find that when using create_training_graph, the node became
clone_1/FeatureExtractor/MobilenetV2/expanded_conv_6/expand/act_quant/clone_1/FeatureExtractor/MobilenetV2/expanded_conv_6/expand/act_quant/max/biased
when using create_eval_graph, actually the node is
FeatureExtractor/MobilenetV2/expanded_conv_6/expand/act_quant/max/biased
so i have to rewrite the node to save the frozen quantization pb, is there something wrong?
I'm seeing the same behavior, aprox 0.125 secs/step with ssdlite_mobilenet_v2_coco float, and 0.8 secs/steps training the quantized version of the same model.
Info:
What is the top-level directory of the model you are using ssd_mobilenet_v2_quantized_300x300_coco_2018_09_14
Have I written custom code No
OS Platform and Distribution Ubuntu 18.04 64-bit Intel
TensorFlow installed from Sources
TensorFlow version 1.12
Bazel version 0.17.2
CUDA/cuDNN version 10.0
GPU model and memory GTX2080TI with 11 GB
Exact command to reproduce python3 /content/models/research/object_detection/legacy/train.py --logtostderr --train_dir={model_dir} --pipeline_config_path={pipeline_fname}
@hvico have you fix this problem?
@hvico have you fix this problem?
Nop, I haven't. It trains much slower than the float model but it works.
Quantization aware training is much slower in general for vision models, since internally we have to do two Conv operations for every Conv in the float model. 10x slower seems a bit extreme though - looping in Suharsh to comment
I am also facing the same problem.Quantization aware training for ssd_mobilenet_v1_coco is upto 10x slower than without the quantization(by not adding this line "graph_rewriter { quantization { delay: 0 weight_bits: 8 activation_bits: 8 } }"). Is it because i am doing it wrong or it in not yet fixed by tensor flow?
I am facing the same problem. I am more concerned about inference. For inference, I converted the model (SSD ResNet 50) to TFLite format. A TFLite converted quantized model is 3 times slower than TFLite converted non-quantized model. This is the exact opposite of what I was expecting from quantization. Thanks.
use ssdlite_mobilenet_v2_coco.config and modify it by adding graph_rewriter
graph_rewriter { quantization { delay: 0 weight_bits: 8 activation_bits: 8 } }
but the time of each step is 10x slower than the config without graph_rewriterthe log without graph_rewriter
the log with graph_rewiter
the tensorflow-gpu version is 1.12. is the speed normal, any idea for this?