tensorflow / models

Models and examples built with TensorFlow
Other
76.95k stars 45.79k forks source link

How to close down the other Features?(for tensorflow object detection api) #6387

Open sed0724963 opened 5 years ago

sed0724963 commented 5 years ago

I'm use the new api and it's use the model_main.py to training ,but it is so slow, I watched the tensorboard and discover the api have many features,like "DetectionBoxes_Precision","DetectionBoxes_Recall","learning_rate".... I assume that features is the reason of why api is so slow, SO, my question is : If I want to boost the training speeds,I assume I should clear the other features,only use loss ,and loss_1,loss_2,other features about evalution I want to delete. How can I do?modify what?

123

System information What is the top-level directory of the model you are using: model/research Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no OS Platform and Distribution (e.g., Linux Ubuntu 16.04): win10 TensorFlow installed from (source or binary): pip TensorFlow version (use command below): 1.9.0 Bazel version (if compiling from source): don't know CUDA/cuDNN version: cuda:9 , cuDNN:7 GPU model and memory: ASUS ROG Strix GeForce® GTX 1080 Exact command to reproduce: don't

rootkitchao commented 5 years ago

These are not additional features, but are evaluated during the training process.I forgot if this is specified in the configuration file.If not, try setting --sample_1_of_n_eval_examples=0, which may be useful.Or use legacy/train.py. python legacy/train.py --logtostderr --num_clones=2 --ps_tasks=1 --train_dir=D:\tf_project\mscoco\model\train --pipeline_config_path=D:\tf_project\mscoco\model\ssd_mnasnet_b1_fpn_shared_box_predictor_640x640_coco17.config

sed0724963 commented 5 years ago

@rootkitchao it is not work by setting --sample_1_of_n_eval_examples=0, although the evaluated is closed,but speed is so slow too,and only plot the training loss. I had used to run the train.py of legacy,but it is not plot the validation loss, So I change to use the model_main.py,because I want to plot the training and validation loss. did you have any idea about this?or train.py can plot the training and validation loss?

rootkitchao commented 5 years ago

@rootkitchao it is not work by setting --sample_1_of_n_eval_examples=0, although the evaluated is closed,but speed is so slow too,and only plot the training loss. I had used to run the train.py of legacy,but it is not plot the validation loss, So I change to use the model_main.py,because I want to plot the training and validation loss. did you have any idea about this?or train.py can plot the training and validation loss?

If you want to get the validation loss, then it will run the evaluation and slow down the training.I don't know the model you use, the input resolution and the specific speed.But the training speed of a single GTX1080 will not be very fast.

sed0724963 commented 5 years ago

@rootkitchao I'm use faster-rcnn-resnet101.coco model,and I only single GTX1080,so it is no Solution for my questions ?

sed0724963 commented 5 years ago

@rootkitchao I have an idea about I use the train.py to training the model first ,and when it's complete I change it to use the model_main.py to see the validation, but I don't know this is correct or not. because the Numerical value of validation loss is different by same training model, I use model_main.py only(loss is 0.12) or I use the train.py&model_main.py(loss is 0.08).

rootkitchao commented 5 years ago

For your problem, I think that a more efficient model can be solved, such as retinanet.

sed0724963 commented 5 years ago

@rootkitchao this models can be used of tensorflow object detection api??

rootkitchao commented 5 years ago

@rootkitchao this models can be used of tensorflow object detection api??

The object detection API has implemented retinanet.

sed0724963 commented 5 years ago

@rootkitchao ssd_resnet_v1_fpn_feature_extractor.py??it's a kind of retinanet?not a kind of resnet?

rootkitchao commented 5 years ago

Resnet is the backbone, you can switch to other networks, such as Pnasnet or Mobilenet.The backbone of Retinanet in the paper is Resnet-101-FPN. In addition, Retinanet can be seen as a modified version of SSD, using a backbone with FPN, modifying the box predictor, and using focal loss.

tensorflowbutler commented 4 years ago

Hi There, We are checking to see if you still need help on this, as this seems to be an old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing. If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.