coco training isn't using the gpu despite saying it is

kittles commented 6 years ago

hello! thanks for the libraries, i'd be stuffed if i tried doing any of this without them! i am trying to train coco to recognize a new single class, so i made a dataset with polygon annotations and set up a script like balloon.py example. after a bunch of twiddling, i can get it to start actually training, but its really slow and i think its only using the cpu. i know this question comes up alot, so i want to assure people that its not because im using tensorflow instead of tensorflow-gpu. when i run the balloon.py example, i can see the gpu working hard. when i run my own script, i see tensorflow grabs all the memory, as well as logging that its using the gpu, but the gpu never ends up doing work during the training. ive included some of the relevant details below:

the training file:

import os
import sys
import json
import datetime
import numpy as np
import skimage.draw
import xml.etree.ElementTree as etree
import glob
import cv2
from PIL import Image, ImageDraw

# Root directory of the project
ROOT_DIR = r'blah'
EXPERIMENT_DIR = r'blah'

# Import Mask RCNN
sys.path.append(ROOT_DIR)
from mrcnn.config import Config
from mrcnn import utils
import mrcnn.model as modellib

# Path to trained weights file
COCO_WEIGHTS_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")

# Directory to save logs and model checkpoints, if not provided
# through the command line argument --logs
DEFAULT_LOGS_DIR = os.path.join(ROOT_DIR, "logs")

############################################################
#  Configurations
############################################################

class ExperimentConfig (Config):
    # Give the configuration a recognizable name
    NAME = "pins"

    # im using 512x512 rgb images
    # im also using a gtx 1080 ti which has 12gb memory... but it seems to still not have enough to do 2 images
    IMAGES_PER_GPU = 1

    # Number of classes (including background)
    NUM_CLASSES = 1 + 1  # Background + pins

    # Number of training steps per epoch
    STEPS_PER_EPOCH = 100

    # Skip detections with < 90% confidence
    DETECTION_MIN_CONFIDENCE = 0.9

############################################################
#  Dataset
############################################################

class ExperimentDataset(utils.Dataset):

    def load_dataset (self, subset_dir):
        self.add_class("pins", 1, "pins")
        for img_path in glob.glob(EXPERIMENT_DIR + r'\raw\{0}\*.JPG'.format(subset_dir)):
            img_name = os.path.split(img_path)[-1].replace('.JPG', '')
            img_resized_path = img_path.replace('raw', 'resized').replace(r'\{0}'.format(subset_dir), '')
            img = cv2.imread(img_path)
            img_resized = cv2.imread(img_resized_path)
            mask = np.zeros(img.shape, np.uint8)
            annotations_fp = EXPERIMENT_DIR + r'\polygon_annotations\%s.json' % (img_name)
            with open(annotations_fp, 'r') as txt:
                annotations = json.loads(txt.read())
            height, width, channels = img.shape
            x_norm = 512 / width
            y_norm = 512 / height

            def resize_poly (poly):
                return np.array([
                    np.array([round(point[0] * x_norm), round(point[1] * y_norm)], dtype=np.int32)
                    for point in poly
                ], dtype=np.int32)

            '''
            i guess polygons are supposed to look like this?
            {
                'name': 'polygon', 
                'all_points_x': [1173, 1174, 1172, 1164, 1158, 1150, 1141, 1135, 1131, 1119, 1105, 1094, 1084, 1075, 1062, 1045, 1035, 1026, 1014, 1005, 1000, 995, 1000, 1004, 1017, 1026, 1044, 1061, 1076, 1100, 1116, 1133, 1151, 1161, 1169, 1173], 
                'all_points_y': [149, 163, 184, 204, 219, 228, 240, 248, 256, 266, 281, 287, 288, 284, 272, 253, 243, 230, 213, 193, 180, 155, 134, 120, 98, 86, 72, 65, 61, 63, 69, 78, 95, 111, 129, 149]
            }
            '''
            polygons = []
            for shape in annotations['shapes']:
                p_points = resize_poly(shape['points'])
                p_obj = {
                    'name': 'polygon',
                    'all_points_x': [p[0] for p in p_points],
                    'all_points_y': [p[1] for p in p_points],
                }
                polygons.append(p_obj)

            self.add_image(
                "pins",
                image_id=img_name,
                path=img_resized_path,
                width=width,
                height=height,
                polygons=polygons)

    def load_mask(self, image_id):
        """Generate instance masks for an image.
       Returns:
        masks: A bool array of shape [height, width, instance count] with
            one mask per instance.
        class_ids: a 1D array of class IDs of the instance masks.
        """
        # If not a balloon dataset image, delegate to parent class.
        image_info = self.image_info[image_id]
        if image_info["source"] != "pins":
            return super(self.__class__, self).load_mask(image_id)

        # Convert polygons to a bitmap mask of shape
        # [height, width, instance_count]
        info = self.image_info[image_id]
        mask = np.zeros([info["height"], info["width"], len(info["polygons"])],
                        dtype=np.uint8)
        for i, p in enumerate(info["polygons"]):
            # Get indexes of pixels inside the polygon and set them to 1
            rr, cc = skimage.draw.polygon(p['all_points_y'], p['all_points_x'])
            mask[rr, cc, i] = 1

        # Return mask, and array of class IDs of each instance. Since we have
        # one class ID only, we return an array of 1s
        return mask.astype(np.bool), np.ones([mask.shape[-1]], dtype=np.int32)

def train (model):
    """Train the model."""
    # Training dataset.
    dataset_train = ExperimentDataset()
    dataset_train.load_dataset('train')
    dataset_train.prepare()

    # Validation dataset
    dataset_val = ExperimentDataset()
    dataset_val.load_dataset('validate')
    dataset_val.prepare()

    model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE,
                epochs=1,
                layers='heads')

if __name__ == '__main__':
    config = ExperimentConfig()
    model = modellib.MaskRCNN(mode="training", 
            config=config, 
            model_dir=r'blah')
    weights_path = COCO_WEIGHTS_PATH
    model.load_weights(weights_path, by_name=True, exclude=[
        "mrcnn_class_logits", "mrcnn_bbox_fc",
        "mrcnn_bbox", "mrcnn_mask"
    ])
    train(model)

and some of the output that might be relevant? this is all the output before it starts actually training.

Using TensorFlow backend.
~~~FINISHED BUILDING~~~
2018-06-23 13:37:50.800601: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 11.00GiB freeMemory: 9.10GiB
2018-06-23 13:37:50.805220: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2018-06-23 13:37:53.072980: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-23 13:37:53.077371: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929]      0
2018-06-23 13:37:53.079371: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0:   N
2018-06-23 13:37:53.082295: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8805 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
CLASSES: [{'source': '', 'id': 0, 'name': 'BG'}, {'source': 'pins', 'id': 1, 'name': 'pins'}]
CLASSES: [{'source': '', 'id': 0, 'name': 'BG'}, {'source': 'pins', 'id': 1, 'name': 'pins'}]

Starting at epoch 0. LR=0.001

Checkpoint Path: C:\Users\patrick\object-segmentation\Mask_RCNN\logs\pins20180623T1337\mask_rcnn_pins_{epoch:04d}.h5
Selecting layers to train
fpn_c5p5               (Conv2D)
fpn_c4p4               (Conv2D)
fpn_c3p3               (Conv2D)
fpn_c2p2               (Conv2D)
fpn_p5                 (Conv2D)
fpn_p2                 (Conv2D)
fpn_p3                 (Conv2D)
fpn_p4                 (Conv2D)
In model:  rpn_model
    rpn_conv_shared        (Conv2D)
    rpn_class_raw          (Conv2D)
    rpn_bbox_pred          (Conv2D)
mrcnn_mask_conv1       (TimeDistributed)
mrcnn_mask_bn1         (TimeDistributed)
mrcnn_mask_conv2       (TimeDistributed)
mrcnn_mask_bn2         (TimeDistributed)
mrcnn_class_conv1      (TimeDistributed)
mrcnn_class_bn1        (TimeDistributed)
mrcnn_mask_conv3       (TimeDistributed)
mrcnn_mask_bn3         (TimeDistributed)
mrcnn_class_conv2      (TimeDistributed)
mrcnn_class_bn2        (TimeDistributed)
mrcnn_mask_conv4       (TimeDistributed)
mrcnn_mask_bn4         (TimeDistributed)
mrcnn_bbox_fc          (TimeDistributed)
mrcnn_mask_deconv      (TimeDistributed)
mrcnn_class_logits     (TimeDistributed)
mrcnn_mask             (TimeDistributed)
C:\Users\patrick\AppData\Local\conda\conda\envs\MaskRCNN\lib\site-packages\tensorflow\python\ops\gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

finally, once its training, it takes about a minute per step, and here is what task manager shows for hardware usage:

screenshot 2

thank you for making it this far! i hope i've included all the relevant info- any advice or even shots in the dark are appreciated!

zungam commented 6 years ago

So you dont see a line like this during training:

1/1000 [=>.................................................] - ETA: 12:45 - loss: 1.3515 - rpn_class_loss: 0.1585 - rpn_bbox_loss: 0.4085 - mrcnn_class_loss: 0.1654 - mrcnn_bbox_loss: 0.2975 - mrcnn_mask_loss: 0.3216

? If not it means it never starts training

kittles commented 6 years ago

no, i do see lines like that. it does train, just super slow

On Mon, Jun 25, 2018 at 12:25 AM Magnus Reiersen notifications@github.com wrote:

So you dont see a line like this during training:

1/1000 [=>.................................................] - ETA: 12:45

loss: 1.3515 - rpn_class_loss: 0.1585 - rpn_bbox_loss: 0.4085 - mrcnn_class_loss: 0.1654 - mrcnn_bbox_loss: 0.2975 - mrcnn_mask_loss: 0.3216

? If not it means it never starts training

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/matterport/Mask_RCNN/issues/712#issuecomment-399857401, or mute the thread https://github.com/notifications/unsubscribe-auth/AAd_1lkEcKw-xAQ5Jkc0TeXWb__rb-DMks5uAJBtgaJpZM4U07bu .

zungam commented 6 years ago

Download Tech Powerup GPU-Z, in settings, go to update time and set it to 0.1. Try to see how the GPU is utilzed over time. Take a screenshot after 3 iterations and add the screenshot here

myBestLove commented 6 years ago

I met the same , my config set gpu as 8 , but when I was training , the gpu didn't use, but cpu was highly used. how to solve?

s-bayer commented 6 years ago

Could you post the results of running pip list?

I would assume you have tensorflow installed instead of tensorflow-gpu, which is an error in the requirements.txt of this repo.

junweima commented 6 years ago

you need to install tensorflow-gpu instead of tensorflow

DeepNeuralBot commented 5 years ago

I know it is a bit late but if someone is struggling with the problem: You can check it this way from keras import backend as K K.tensorflow_backend._get_available_gpus()

or

from tensorflow.python.client import device_lib print(device_lib.list_local_devices())

if no gpu is detected, then you have to pip3 install tensorflow-gpu

then do the line above and you should see [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 13043076236058885011 , name: "/device:XLA_GPU:0" device_type: "XLA_GPU" memory_limit: 17179869184 locality { } incarnation: 15906157833526132886 physical_device_desc: "device: XLA_GPU device" , name: "/device:XLA_CPU:0" device_type: "XLA_CPU" memory_limit: 17179869184 locality { } incarnation: 9907085518476589959 physical_device_desc: "device: XLA_CPU device" , name: "/device:GPU:0" device_type: "GPU" memory_limit: 15702143796 locality { bus_id: 1 links { } } incarnation: 15516049900956998810 physical_device_desc: "device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:03:00.0, compute capability: 6.0" ]

atlurip commented 5 years ago

Hi,Below are the details [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 3810402909033317137 , name: "/device:GPU:0" device_type: "GPU" memory_limit: 9121682555 locality { bus_id: 1 links { } } incarnation: 7365749146294383826 physical_device_desc: "device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5" ]

still it is running slow

QuintinKruger commented 5 years ago

Are there still no solutions ? I'm also having the same problem

dvlshah commented 5 years ago

I think there is issue if you run training on rtx 2080 ti with cuda 9. I am facing the same issue

Tensorflow = 1.9.0 Cuda = 9.0 Cudnn = 7.0.5 python = 3.5.2 gpu = rtx 2080 ti

I think tensorflow does not support gpu utilization when running cuda 9 on rtx gpu's. Can someone please help?

QuintinKruger commented 5 years ago

I ended up changing to Yolo as my application did not require me to use Mask R-CNN.

On Thu, 22 Aug 2019 at 07:00, Deval Shah notifications@github.com wrote:

I think there is issue if you run training on rtx 2080 ti with cuda 9. I am facing the same issue

Tensorflow = 1.9.0 Cuda = 9.0 Cudnn = 7.0.5 python = 3.5.2 gpu = rtx 2080 ti

I think tensorflow does not support gpu utilization when running cuda 9 on rtx gpu's. Can someone please help?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/matterport/Mask_RCNN/issues/712?email_source=notifications&email_token=AIUMG77LSUXF3TQVKEEK5R3QFYMQDA5CNFSM4FGTW3XKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD434H6Y#issuecomment-523748347, or mute the thread https://github.com/notifications/unsubscribe-auth/AIUMG73KWDWXD2B3UZKPS3LQFYMQDANCNFSM4FGTW3XA .

matterport / Mask_RCNN

coco training isn't using the gpu despite saying it is #712