The speed between CPU and GPU

WenmuZhou commented 6 years ago

when I predict image with the output model, I find CPU is faster than GPU the code is

import tensorflow as tf
from src.loader import PredictionModel
from scipy.misc import imread
import time
import os

def predict(model_dir, image,gpu_id = 0):
    os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id)
    config_sess = tf.ConfigProto()
    config_sess.gpu_options.per_process_gpu_memory_fraction = 0.2

    with tf.Session(config=config_sess) as sess:
        start = time.time()
        model = PredictionModel(model_dir,session=sess)
        predictions = model.predict(image)
        transcription = predictions['words']
        score = predictions['score']
        return [transcription[0].decode(), score, time.time() - start]

if __name__ == '__main__':
    model_dir = 'ckpt2model/export/1511402179/'
    image = imread('/data/zhoujun/data/print_char/val1/3_song.jpg', mode='L')[:, :, None]
    result = predict(model_dir, image,0)
    print(result)

and gpu cost 3.9， cpu cost 0.9s

WenmuZhou commented 6 years ago

I have test the tensorflow-gpu 1.4, tensorflow-gpu 1.3， tensorflow 1.4 and tensorflow 1.4 the code is

# -*- coding: utf-8 -*-
# @Time    : 2017/10/26 14:09
# @Author  : zhoujun

import tensorflow as tf
from scipy.misc import imread
import time
import os

class PredictionModel:

    def __init__(self, model_dir, session=None):
        if session:
            self.session = session
        else:
            self.session = tf.get_default_session()
        start = time.time()
        self.model = tf.saved_model.loader.load(self.session, ['serve'], model_dir)
        print('load_model_time:', time.time() - start)

        self._input_dict, self._output_dict = _signature_def_to_tensors(self.model.signature_def['predictions'])

    def predict(self, image):
        output = self._output_dict
        # 运行predict  op
        start = time.time()
        result = self.session.run(output, feed_dict={self._input_dict['images']: image})
        print('predict_time:',time.time()-start)
        return result

def _signature_def_to_tensors(signature_def):  # from SeguinBe
    g = tf.get_default_graph()
    return {k: g.get_tensor_by_name(v.name) for k, v in signature_def.inputs.items()}, \
           {k: g.get_tensor_by_name(v.name) for k, v in signature_def.outputs.items()}

def predict(model_dir, image,gpu_id = 0):
    os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id)

    with tf.Session() as sess:
        start = time.time()
        model = PredictionModel(model_dir,session=sess)
        predictions = model.predict(image)
        transcription = predictions['words']
        score = predictions['score']
        return [transcription[0].decode(), score, time.time() - start]

if __name__ == '__main__':
    model_dir = 'model/'
    image = imread('3_song.jpg', mode='L')[:, :, None]
    result = predict(model_dir, image,0)
    print(tf.__version__)
    print(result)

tensorflow-gpu 1.4 log

2017-12-02 22:30:30.147490: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2017-12-02 22:30:31.083045: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 1.62GiB
2017-12-02 22:30:31.083203: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
load_model_time: 2.9134938716888428
predict_time: 4.135322570800781
1.4.0

tensorflow-gpu 1.3 log

2017-12-02 22:39:46.346092: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-12-02 22:39:46.346191: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-02 22:39:47.187559: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:955] Found device 0 with properties:
name: GeForce GTX 1050
major: 6 minor: 1 memoryClockRate (GHz) 1.493
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.62GiB
2017-12-02 22:39:47.187717: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:976] DMA: 0
2017-12-02 22:39:47.188099: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:986] 0:   Y
2017-12-02 22:39:47.188224: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0)
load_model_time: 33.31174421310425
predict_time: 1.6684315204620361
1.3.0

tensorflow 1.4 log

2017-12-02 22:47:21.368827: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
load_model_time: 2.340376853942871
predict_time: 3.538094997406006
1.4.0

tensorflow 1.3 log

2017-12-02 22:51:33.877932: W C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-12-02 22:51:33.878031: W C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
load_model_time: 2.2079925537109375
predict_time: 0.5485155582427979
1.3.0

cipri-tom commented 6 years ago

Wow, very different results among different versions !

Do you have a big image ? I imagine it could be that the time to transfer it to the GPU is significant, though this is highly improbable.

WenmuZhou commented 6 years ago

I will run 100-1000 time as take a average

solivr commented 6 years ago

I have a prediction in 0.9s with GPU and 0.16 with CPU but I didn't try batch prediction. Do you observe the same behaviour when predicting with image batches (instead of single images) ? I guess that's where GPU processing will make the difference.

WenmuZhou commented 6 years ago

see this https://github.com/tensorflow/tensorflow/issues/15057#issuecomment-349701716

cipri-tom commented 6 years ago

@solivr in order to do batch prediction, we'd have to change the serving_input_fn (code) to have shape N x H x W x C, right ? So we can't batch predict for models which were exported with the current serving_input_fn

solivr commented 6 years ago

Yes indeed!

WenmuZhou commented 6 years ago

when I predict image using GPU, I found that the gpu utilization is always been 0

cipri-tom commented 6 years ago

you'd need to batch images for better GPU utilisation

solivr / tf-crnn

The speed between CPU and GPU #17