tensorflow / models

Models and examples built with TensorFlow
Other
77.21k stars 45.75k forks source link

[object_detection] Empty "variables" folder after I tried to set "export_as_saved_model True" when calling "export_inference_graph.py" #2045

Closed protossw512 closed 7 years ago

protossw512 commented 7 years ago

System information

Describe the problem

I trained some models on my own dataset, both training and evaluation process went well without any problems. I also expoted the model with option "--export_as_saved_model" to be "False" and tried to run in my python script to inference new images, also without any problems. However, I was planing to server the model with tensorflow serving, which means I need to switch "--export_as_saved_model" to be "True" to get model exported as a servable model. It seems the converting process went fine:

2017-07-26 17:13:26.931536: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-26 17:13:26.931593: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-26 17:13:26.931607: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-26 17:13:26.931618: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-26 17:13:26.931629: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-07-26 17:13:27.337890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:84:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
2017-07-26 17:13:27.337993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-07-26 17:13:27.338017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-07-26 17:13:27.338230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:84:00.0)
Converted 277 variables to const ops.
2017-07-26 17:13:30.349176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:84:00.0)

And I also got the .pb file with reasonable size. However, the "variables" folder was empty. According to the example provided by tensorflow serving, it should contain files like "variables.data-00000-of-00001" and "variables.index". When I tried to serve the model, tensorflow serving returned the information below:

2017-07-26 16:42:51.545132: I tensorflow_serving/model_servers/main.cc:149] Building single TensorFlow model file config:  model_name: detection model_base_path: /home/xinyao/workspace/models/object_detection/serving_model
2017-07-26 16:42:51.545337: I tensorflow_serving/model_servers/server_core.cc:375] Adding/updating models.
2017-07-26 16:42:51.545346: I tensorflow_serving/model_servers/server_core.cc:421]  (Re-)adding model: detection
2017-07-26 16:42:51.646110: I tensorflow_serving/core/basic_manager.cc:705] Successfully reserved resources to load servable {name: detection version: 1}
2017-07-26 16:42:51.646148: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: detection version: 1}
2017-07-26 16:42:51.646168: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: detection version: 1}
2017-07-26 16:42:51.646246: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:360] Attempting to load native SavedModelBundle in bundle-shim from: /home/xinyao/workspace/models/object_detection/serving_model/1
2017-07-26 16:42:51.646272: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:236] Loading SavedModel from: /home/xinyao/workspace/models/object_detection/serving_model/1
2017-07-26 16:42:51.711957: W external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-26 16:42:51.711974: W external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-26 16:42:51.711992: W external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-26 16:42:51.711996: W external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-26 16:42:51.711999: W external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-07-26 16:42:51.815773: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:155] Restoring SavedModel bundle.
2017-07-26 16:42:51.815817: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:165] The specified SavedModel has no variables; no checkpoints were restored.
2017-07-26 16:42:51.815824: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:190] Running LegacyInitOp on SavedModel bundle.
2017-07-26 16:42:51.819608: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:284] Loading SavedModel: success. Took 173340 microseconds.
2017-07-26 16:42:51.819631: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: detection version: 1}
2017-07-26 16:42:51.834766: I tensorflow_serving/model_servers/main.cc:290] Running ModelServer at 0.0.0.0:9000 ...

It seems like the model was not properly served, since "The specified SavedModel has no variables; no checkpoints were restored."

I tried different checkpoint files with different architectures(rfcn-resnet50 and ssd_mobilenet_v1). It seems all have the same problems. I am not sure if I did something wrong or it is a bug.

Source code / logs

One of the config files I am using:

model { faster_rcnn { num_classes: 1 image_resizer { keep_aspect_ratio_resizer { min_dimension: 961 max_dimension: 1199 } } feature_extractor { type: 'faster_rcnn_resnet50' first_stage_features_stride: 8 } first_stage_anchor_generator { grid_anchor_generator { scales: [0.25, 0.5, 1.0, 2.0] aspect_ratios: [0.5, 1.0, 2.0] height_stride: 8 width_stride: 8 } } first_stage_box_predictor_conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } first_stage_nms_score_threshold: 0.0 first_stage_nms_iou_threshold: 0.8 first_stage_max_proposals: 300 first_stage_localization_loss_weight: 1.0 first_stage_objectness_loss_weight: 1.0 second_stage_box_predictor { rfcn_box_predictor { conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } crop_height: 10 crop_width: 10 num_spatial_bins_height: 2 num_spatial_bins_width: 2 } } second_stage_post_processing { batch_non_max_suppression { score_threshold: 0.0 iou_threshold: 0.6 max_detections_per_class: 300 max_total_detections: 300 } score_converter: SOFTMAX } second_stage_localization_loss_weight: 1.0 second_stage_classification_loss_weight: 1.0 } }

train_config: { batch_size: 1 num_steps: 45000 keep_checkpoint_every_n_hours: 1 optimizer { momentum_optimizer: { learning_rate: { manual_step_learning_rate { initial_learning_rate: 0.0003 schedule { step: 0 learning_rate: .0003 } schedule { step: 15000 learning_rate: .00003 } schedule { step: 20000 learning_rate: .000003 } } } momentum_optimizer_value: 0.9 } use_moving_average: false } gradient_clipping_by_norm: 10.0 fine_tune_checkpoint: "/scratch2/wangxiny2/workspace/models/object_detection/resnet_v1_50.ckpt" from_detection_checkpoint: false data_augmentation_options { random_horizontal_flip { } } }

train_input_reader: { tf_record_input_reader { input_path: "/scratch2/wangxiny2/workspace/models/object_detection/car_train.record" } label_map_path: "/scratch2/wangxiny2/workspace/models/object_detection/data/car_label_map.pbtxt" }

eval_config: { num_visualizations: 33 num_examples: 33 max_evals: 1 visualization_export_dir: "/scratch2/wangxiny2/workspace/models/object_detection/eval_car_Jul_20_3" }

eval_input_reader: { tf_record_input_reader { input_path: "/scratch2/wangxiny2/workspace/models/object_detection/car_val.record" } label_map_path: "/scratch2/wangxiny2/workspace/models/object_detection/data/car_label_map.pbtxt" shuffle: false num_readers: 1 }

derekjchow commented 7 years ago

This is working as intended. The exporter script converts all variables into constants when exporting the graph.

enricorotundo commented 6 years ago

This seems a popular issue with the pre-trained models shipped with this repository. See this and #1988. Has anyone found a way to convert these pre-trained models into servables yet?

austinmw commented 5 years ago

@derekjchow so what needs to be changed to use the exported model with TF serving?

wronk commented 5 years ago

I was able to cobble together a hack for the current code version (as of July 2, 2019). Check it out here.

Note that setting --export_as_saved_model True doesn't seem to help. That flag doesn't affect if/how the variables file is exported.

HassanBT commented 5 years ago

Has anyone been able to solve this