why run the model_main.py it's so slow?(use faster-rcnn resnet101.coco)

tensorflow / models

Models and examples built with TensorFlow

Other

77.04k stars 45.77k forks source link

why run the model_main.py it's so slow?(use faster-rcnn resnet101.coco) #6227

Open sed0724963 opened 5 years ago

sed0724963 commented 5 years ago

I'm using the object detecton api and use the model_main.py to training a detection model, I use the faster-rcnn resnet101.coco and batch size =1,but it's so slow?

what can I do to let it to be faster??

main

System information

What is the top-level directory of the model you are using: model/research
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): win10
TensorFlow installed from (source or binary): pip
TensorFlow version (use command below): 1.9.0
Bazel version (if compiling from source): don't know
CUDA/cuDNN version: cuda:9 , cuDNN:7
GPU model and memory: ASUS ROG Strix GeForce® GTX 1080
Exact command to reproduce: don't

Cavan09 commented 5 years ago

By default model_main does't have any logging,

After all the imports you can add: tf.logging.set_verbosity(tf.logging.INFO)

This will log everything that is happening.

You should also know that it will be logging each 100 steps instead of each step like train.py. That is configurable but you will need to dig into the code for that.

The other reason it may seem slow, it by default you will be doing evaluations while training (well kind of). After x number of steps, training will pause to load up an evaluation of the current checkpoint. Once that is complete, training will resume again. This is also a configurable option if you only want to train.

sed0724963 commented 5 years ago

@Cavan09 If I only want to train and validation,(loss_1,loss_2), the mAP or anything else I don't need it ,can I delete?

Cavan09 commented 5 years ago

@sed0724963 Delete tf.logging.set_verbosity(tf.logging.INFO)?

All that will do is remove the logging to console, all the different parameters will still be captured.

sed0724963 commented 5 years ago

@Cavan09 I'm mean the tensorboard have many scalars about: loss_1,loss_2 ,learning_rate,DetectionBoxes_Precision/mAP...,so I only want a function that only plot the train and validation loss(loss_1,loss_2). the other function about learning_rate,DetectionBoxes_Precision/mAP ,can I close?because I assume that run a part of evalution function(only train and validation) can to be faster than run all the evalution function,I don't need the mAP or other .

Cavan09 commented 5 years ago

@sed0724963 I'm not sure you will get a huge benefit from removing the others, but if that is something you would like to try, you will have to dig into the COCO tools and evaluation metrics.

CoderGenJ commented 4 years ago

By default model_main does't have any logging,

After all the imports you can add: tf.logging.set_verbosity(tf.logging.INFO)

This will log everything that is happening.

You should also know that it will be logging each 100 steps instead of each step like train.py. That is configurable but you will need to dig into the code for that.

The other reason it may seem slow, it by default you will be doing evaluations while training (well kind of). After x number of steps, training will pause to load up an evaluation of the current checkpoint. Once that is complete, training will resume again. This is also a configurable option if you only want to train.

Do you know how to change the config files or command lines in order to only train?

leedo-hub commented 4 years ago

I did training using model_main.py, but loss_1 appears like loss_2 in tensorboard. They supposed to be different since one of the is for training and the other for validation.