Loaded trained model allocates most of GPU memory

Borda commented 5 years ago

I want to run multiple object detection instances on a single GPU but each of then always allocates a certain percentage of remaining GPU resources. I was playing with small pre-trained models like ssd_mobilenet_v1_coco from ZOO. I was following this tutorial.

I am wondering why the same script loading the same models takes once 70% 4GB (GTX 1050) and 66% of 11GB (RTX 2080) memory...

System information

OS Platform and Distribution: Linux-4.15.0-54-generic-x86_64-with-LinuxMint-19.1-tessa
TensorFlow version: ("b'v1.13.1-0-g6612da8951'", '1.13.1')
Bazel version (if compiling from source): no
CUDA/cuDNN version: Driver: 418.56,CUDA: 10.1
GPU model and memory: GeForce GTX 1050 with 4042MiB

Describe the problem

I have downloaded pre-trained simple Object detection model ssd_mobilenet_v1_coco and run prediction on a sample image. The loaded model takes once 70% 4GB (GTX 1050) and 66% of 11GB (RTX 2080) memory.

Source code / logs

import os
import sys
import tarfile
import six.moves.urllib as urllib

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

sys.path.append("/home/jb/Workspace/tfmodels/research")
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

MODEL_NAME = 'ssd_mobilenet_v1_coco_2018_01_28'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)

for file in tar_file.getmembers():
    file_name = os.path.basename(file.name)
    if 'frozen_inference_graph.pb' in file_name:
        tar_file.extract(file, os.getcwd())

detection_graph = tf.Graph()
with detection_graph.as_default():
    graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
        serialized_graph = fid.read()
        graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(graph_def, name='')

image_np = plt.imread('people.png')
if image_np.max() < 1.5:
    image_np = np.clip(np.round(image_np * 255), 0, 255).astype(np.uint8)

config = tf.ConfigProto(
    allow_soft_placement=True,
    log_device_placement=False,
)
config.gpu_options.force_gpu_compatible = False
# Don't pre-allocate memory; allocate as-needed
config.gpu_options.allow_growth = True
# config.gpu_options.per_process_gpu_memory_fraction = 0.7

# with detection_graph.as_default():
with tf.Session(config=config, graph=detection_graph) as sess:

    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
    image_np_expanded = np.expand_dims(image_np, axis=0)
    image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')

    # Each box represents a part of the image where a particular object was detected.
    boxes = detection_graph.get_tensor_by_name('detection_boxes:0')

    # Each score represent how level of confidence for each of the objects.
    scores = detection_graph.get_tensor_by_name('detection_scores:0')
    classes = detection_graph.get_tensor_by_name('detection_classes:0')
    num_detections = detection_graph.get_tensor_by_name('num_detections:0')

    # Actual detection.
    (boxes, scores, classes, num_detections) = sess.run([boxes, scores, classes, num_detections],
                                                        feed_dict={image_tensor: image_np_expanded})

I got folowing

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1050    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   44C    P8    N/A /  N/A |   2835MiB /  4042MiB |      9%      Default |
+-------------------------------+----------------------+----------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     32433      C   /usr/bin/python3.6                          2551MiB |
+-----------------------------------------------------------------------------+

and

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   1  GeForce RTX 208...  Off  | 00000000:41:00.0 Off |                  N/A |
| 35%   53C    P8    24W / 260W |   6973MiB / 10986MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1     22749      C   /usr/bin/python3                            6963MiB |
+-----------------------------------------------------------------------------+

tensorflowbutler commented 5 years ago

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks. What is the top-level directory of the model you are using Have I written custom code TensorFlow installed from Exact command to reproduce

Borda commented 5 years ago

What is the top-level directory of the model you are using there is no such directory, it is downloaded on the fly to the same folder as the code is running Have I written custom code kind of, I took inspiration from referred post and adjust it to minimum TensorFlow installed from pip Exact command to reproduce run the attached code in PyCharm debugger @tensorflowbutler ^^

Borda commented 5 years ago

@tensorflowbutler @gowthamkpr any update here?

tensorflow / models