tensorflow / models

Models and examples built with TensorFlow
Other
76.94k stars 45.79k forks source link

What's the use of fine_tune_checkpoint? #2446

Closed crazyqg closed 6 years ago

crazyqg commented 6 years ago

Dear Expert:

Just wondering what's the use of this parameter fine_tune_checkpoint? I thought that if I use this parameter, then the training will start from the previous official results. But if I need to train from scratch so for the first time I will need to comment this line, right?

The faster_rcnn_resnet101_pets.config says that the limit of training steps should be set as 200k, but how can I know that when I should stop?

Faster R-CNN with Resnet-101 (v1) configured for the Oxford-IIIT Pet Dataset. Users should configure the fine_tune_checkpoint field in the train config as well as the label_map_path and input_path fields in the train_input_reader and eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that should be configured.

model { faster_rcnn { num_classes: 37 image_resizer { keep_aspect_ratio_resizer { min_dimension: 600 max_dimension: 1024 } } feature_extractor { type: 'faster_rcnn_resnet101' first_stage_features_stride: 16 } first_stage_anchor_generator { grid_anchor_generator { scales: [0.25, 0.5, 1.0, 2.0] aspect_ratios: [0.5, 1.0, 2.0] height_stride: 16 width_stride: 16 } } first_stage_box_predictor_conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } first_stage_nms_score_threshold: 0.0 first_stage_nms_iou_threshold: 0.7 first_stage_max_proposals: 300 first_stage_localization_loss_weight: 2.0 first_stage_objectness_loss_weight: 1.0 initial_crop_size: 14 maxpool_kernel_size: 2 maxpool_stride: 2 second_stage_box_predictor { mask_rcnn_box_predictor { use_dropout: false dropout_keep_probability: 1.0 fc_hyperparams { op: FC regularizer { l2_regularizer { weight: 0.0 } } initializer { variance_scaling_initializer { factor: 1.0 uniform: true mode: FAN_AVG } } } } } second_stage_post_processing { batch_non_max_suppression { score_threshold: 0.0 iou_threshold: 0.6 max_detections_per_class: 100 max_total_detections: 300 } score_converter: SOFTMAX } second_stage_localization_loss_weight: 2.0 second_stage_classification_loss_weight: 1.0 } }

train_config: { batch_size: 1 optimizer { momentum_optimizer: { learning_rate: { manual_step_learning_rate { initial_learning_rate: 0.0003 schedule { step: 0 learning_rate: .0003 } schedule { step: 900000 learning_rate: .00003 } schedule { step: 1200000 learning_rate: .000003 } } } momentum_optimizer_value: 0.9 } use_moving_average: false } gradient_clipping_by_norm: 10.0

fine_tune_checkpoint: "./object_detection/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt"

fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"

from_detection_checkpoint: true

Note: The below line limits the training process to 200K steps, which we

empirically found to be sufficient enough to train the pets dataset. This

effectively bypasses the learning rate schedule (the learning rate will

never decay). Remove the below line to train indefinitely.

num_steps: 20000

num_steps: 200000

data_augmentation_options { random_horizontal_flip { } } }

train_input_reader: { tf_record_input_reader { input_path: "/root/tensorflow/models/object_detection/pet_train.record" } label_map_path: "/root/tensorflow/models/object_detection/data/pet_label_map.pbtxt" }

eval_config: { num_examples: 2000

Note: The below line limits the evaluation process to 10 evaluations.

Remove the below line to evaluate indefinitely.

max_evals: 10 }

eval_input_reader: { tf_record_input_reader { input_path: "/root/tensorflow/models/object_detection/pet_val.record" } label_map_path: "/root/tensorflow/models/object_detection/data/pet_label_map.pbtxt" shuffle: false num_readers: 1 }

When the training step reaches 1000, I convert the output to frozen_inference_graph.pb and got the following results in detecting a dog picture:


scores: [[ 0.06944662 0.04194811 0.04175233 0.03936137 0.03506881 0.03396844 0.03341869 0.02836643 0.02728182 0.02523476 0.02402067 0.02282678 0.0228122 0.02225946 0.02186905 0.02164866 0.02146952 0.02128034 0.02122877 0.02121425 0.02107508 0.02090965 0.02087757 0.02070165 0.01983244 0.01971057 0.01959467 0.01917719 0.01917494 0.01914807 0.01879219 0.01872755 0.01853695 0.01849682 0.01826908 0.01822293 0.01768292 0.01764779 0.0175221 0.01724721 0.0171386 0.01713757 0.01701683 0.01693976 0.01655906 0.01655879 0.0164468 0.0162534 0.01620389 0.01616097 0.01604703 0.01594366 0.01587138 0.01577087 0.01575576 0.01567072 0.0156308 0.01550222 0.01546394 0.0154065 0.01538335 0.01530054 0.01521439 0.01508005 0.01503054 0.01502952 0.01501724 0.01498683 0.01497203 0.0149293 0.01492907 0.01488974 0.0148831 0.01483789 0.01479487 0.01475857 0.01474129 0.01472051 0.01470696 0.01465248 0.01456203 0.0144611 0.01433412 0.01431429 0.01428371 0.01425853 0.01423363 0.01411561 0.01407829 0.01404255 0.01400355 0.0140005 0.01392917 0.01388019 0.01379 0.01376873 0.01374718 0.01364442 0.01355834 0.01347447 0.01341203 0.01340594 0.01337446 0.01331094 0.01317479 0.0131004 0.01303376 0.01296751 0.01289105 0.01279648 0.01277457 0.01268381 0.01266333 0.01261472 0.01258451 0.01257695 0.01257371 0.01253676 0.01253119 0.01236328 0.01236133 0.01233734 0.01232264 0.01231525 0.01231252 0.01228142 0.0122811 0.01227625 0.0121429 0.01206302 0.01203659 0.01202152 0.01197013 0.01194972 0.01194221 0.01186142 0.01170894 0.01170498 0.01168339 0.01166411 0.01161441 0.01156989 0.01155359 0.01155222 0.01137351 0.01136513 0.01134263 0.011284 0.0112753 0.01123225 0.01122407 0.01111983 0.01111377 0.01110441 0.01102397 0.01099771 0.0108909 0.01085272 0.01083522 0.01074724 0.0106981 0.01069558 0.01067937 0.01066618 0.01063939 0.01060463 0.01060337 0.01056996 0.01055606 0.01054414 0.01050773 0.0104952 0.01046663 0.01045137 0.01035698 0.01031965 0.01030521 0.01029356 0.01026726 0.0102616 0.0102529 0.01023215 0.01022986 0.01021929 0.0102128 0.01020036 0.01015684 0.01007472 0.01002086 0.00998082 0.00988512 0.00986281 0.00985666 0.00982132 0.00980562 0.00979157 0.00977272 0.00976781 0.00972535 0.00972107 0.00971623 0.0096608 0.00960222 0.00959306 0.00956489 0.00955793 0.0095285 0.0095277 0.0094817 0.0094764 0.00946174 0.00937475 0.00932345 0.00927965 0.00924221 0.00923011 0.00922853 0.00917996 0.00917259 0.00913913 0.00913864 0.00911407 0.00908112 0.00903553 0.00902831 0.00901958 0.00901761 0.00899136 0.00897678 0.00895084 0.00888303 0.00888036 0.00887968 0.00885241 0.00884694 0.00881344 0.00880111 0.00880083 0.00878923 0.00878035 0.00878014 0.00877578 0.00874164 0.00871974 0.00870612 0.00866308 0.00866087 0.00864434 0.00863423 0.00863183 0.00859103 0.00857641 0.00857165 0.00856331 0.0085456 0.00851526 0.00851121 0.00850364 0.00844978 0.00843677 0.00836123 0.00835519 0.0083485 0.00830275 0.00828412 0.00827416 0.00827094 0.00825203 0.0082481 0.00824689 0.00822878 0.00822503 0.00822477 0.00821862 0.00818373 0.00817982 0.00815457 0.00814281 0.00813758 0.0081308 0.00812449 0.00811765 0.00811404 0.00810262 0.00808576 0.00807293 0.00806132 0.00805942 0.00802655 0.00801983 0.00801869 0.00800397 0.00800376 0.0079942 0.00798685 0.00797676 0.0079742 0.00796114 0.00794861 0.00791004]] classes: [[ 15. 15. 15. 2. 15. 15. 15. 15. 2. 15. 15. 19. 19. 2.

                          1. 2.
                          1. 19.
                          1. 20.
                            1. 19.
                          1. 7.
                          1. 15.
                          1. 19.
                          1. 29.
                          1. 7.
                          1. 19.
                          1. 19.
                          1. 15.
                          1. 19.
                            1. 19.
                          1. 29.
                          1. 15.
                          1. 9.
                          1. 5.
                          1. 29.
                          1. 29.
            1. 29.]] num: [ 300.]

    And the the training step reaches 5000, the result is following:

    scores: [[ 0.01012865 0.0095656 0.00846167 0.0083519 0.0081428 0.00793877 0.00780326 0.00738764 0.00720319 0.00717328 0.00702109 0.00699762 0.00695469 0.00693708 0.00689603 0.00689341 0.00682347 0.0067919 0.00678337 0.00674321 0.00667536 0.00652981 0.00652026 0.00651316 0.00644661 0.00640598 0.0063948 0.00636773 0.006357 0.00633072 0.00628792 0.00625798 0.00622244 0.00620775 0.00615659 0.00612047 0.00608881 0.00608222 0.00602958 0.00602263 0.00596477 0.00595437 0.00591918 0.00588317 0.00580666 0.00578876 0.00576117 0.00574217 0.00569798 0.00569309 0.00569133 0.00565342 0.0056329 0.00561814 0.00561546 0.00558781 0.0055239 0.00549977 0.00546995 0.00544188 0.00541785 0.00541074 0.00539507 0.00533954 0.00533781 0.00531058 0.00529205 0.00526606 0.0052498 0.00521484 0.00520865 0.00516368 0.00516033 0.00514712 0.00513685 0.00511866 0.00509207 0.00506464 0.00506041 0.00504955 0.00500685 0.00498611 0.004977 0.00497512 0.00494363 0.0049204 0.00488979 0.00488552 0.00485934 0.00483808 0.00481848 0.00480503 0.00480215 0.00478172 0.00477571 0.00473178 0.00469418 0.00467898 0.0046694 0.0046677 0.00464432 0.00463534 0.00463118 0.00463054 0.00462186 0.00461836 0.00460992 0.00460065 0.00459613 0.00459243 0.00459107 0.00457311 0.00457122 0.00456381 0.00453879 0.00452551 0.00448939 0.00442786 0.00436896 0.00434445 0.00432951 0.00432626 0.00432418 0.00430982 0.00429316 0.0042874 0.00428538 0.00428501 0.00428195 0.00427173 0.00427141 0.00423748 0.00423131 0.00421834 0.00417268 0.00412867 0.00412763 0.0041243 0.00411295 0.00410893 0.00410112 0.00409826 0.00409668 0.00409102 0.0040742 0.00404702 0.00402316 0.00401519 0.00400502 0.00395062 0.0039232 0.00390756 0.00389872 0.00388462 0.00387455 0.00386549 0.00386404 0.0038567 0.00385513 0.00384485 0.00383255 0.00378379 0.00377896 0.00374847 0.00374749 0.00374542 0.00373142 0.00371718 0.00370589 0.00370405 0.00369212 0.00367554 0.00364675 0.00363466 0.00363413 0.00363043 0.00361713 0.00361426 0.00360038 0.00356765 0.0035636 0.00356286 0.00354832 0.00354194 0.00354109 0.00353584 0.00352891 0.00352778 0.00352737 0.00352147 0.00351132 0.00351123 0.00350714 0.00349074 0.00348723 0.00347436 0.00346567 0.00346437 0.00345724 0.00344998 0.00344184 0.00343301 0.00342523 0.00342367 0.00341991 0.00341977 0.00340503 0.00339995 0.00337575 0.0033682 0.00336718 0.00335707 0.00334482 0.00333542 0.00332725 0.00332146 0.00331156 0.00329242 0.00329188 0.00328823 0.00328744 0.00327978 0.00327266 0.00326319 0.00326263 0.00325477 0.00323754 0.00323749 0.00322709 0.00320604 0.00320126 0.00320078 0.00318907 0.00318784 0.00316502 0.00316022 0.00314454 0.00314285 0.00310829 0.00309608 0.0030903 0.003075 0.00307337 0.003067 0.00305495 0.00305396 0.00305361 0.00304828 0.00304686 0.00303647 0.00302384 0.00302355 0.00301787 0.00301675 0.00301501 0.00301163 0.00299895 0.00298768 0.00297569 0.00297219 0.0029673 0.00296174 0.00292937 0.00292707 0.00292392 0.00291627 0.00291123 0.00290746 0.00290669 0.00290444 0.00290126 0.0028981 0.00289595 0.00289129 0.00288885 0.00288371 0.00288093 0.00287843 0.00287723 0.00287044 0.00286095 0.00285842 0.00285619 0.00285486 0.0028468 0.00284314 0.00283165 0.00282653 0.00282344 0.00281912 0.00281329 0.00281321 0.00280992 0.00280158 0.00280149 0.00279888 0.00278973 0.00278777 0.0027773 0.00277507]] classes: [[ 35. 35. 36. 15. 4. 35. 35. 36. 7. 22. 21. 4. 32. 22.

                          1. 32.
                            1. 10.
                            1. 2.
                          1. 28.
                          1. 10.
                          1. 31.
                          1. 34.
                          1. 17.
                          1. 13.
                          1. 35.
                          1. 2.
                          1. 37.
                          1. 4.
                          1. 4.
                          1. 35.
                          1. 15.
                            1. 20.
                          1. 30.
                          1. 29.
                          1. 33.
          1. 31.]] num: [ 300.]

            How can I know which result is better and when should I stop?

    Cheers, Gang

cipri-tom commented 6 years ago

Just wondering what's the use of this parameter fine_tune_checkpoint?

You can check this parameter and many others in the proto files in object_detection/protos/train.proto (and other .proto files)

I thought that if I use this parameter, then the training will start from the previous official results. But if I need to train from scratch so for the first time I will need to comment this line, right?

Yes, if you really want to start from scratch, then you comment this line. But note that usually you want to reuse some parameters, as it can take DAYS or WEEKS to learn the lower levels of the network. And they will probably learn the same features anyway.

To see when to stop, try to run tensorboard and see how your model improves. I'm still beginner, so I can't tell you exact commands yet.


For more support, it is recommended that you use StackOverflow, as there is a bigger community there. Here it is mostly for developers and reporting important bugs that the devs can focus on.

tatatodd commented 6 years ago

This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there. Thanks!

rohith513 commented 5 years ago

Just wondering what's the use of this parameter fine_tune_checkpoint?

You can check this parameter and many others in the proto files in object_detection/protos/train.proto (and other .proto files)

I thought that if I use this parameter, then the training will start from the previous official results. But if I need to train from scratch so for the first time I will need to comment this line, right?

Yes, if you really want to start from scratch, then you comment this line. But note that usually you want to reuse some parameters, as it can take DAYS or WEEKS to learn the lower levels of the network. And they will probably learn the same features anyway.

To see when to stop, try to run tensorboard and see how your model improves. I'm still beginner, so I can't tell you exact commands yet.

For more support, it is recommended that you use StackOverflow, as there is a bigger community there. Here it is mostly for developers and reporting important bugs that the devs can focus on.

Can someone confirm this?

Just by commenting 'fine_tune_checkpoint' and 'from_detection_checkpoint' lines, the model will train from scratch.