tensorflow / tpu

Reference models and tools for Cloud TPUs.
https://cloud.google.com/tpu/
Apache License 2.0
5.21k stars 1.77k forks source link

error when export spinenet49 to savedmodel #915

Open jk78346 opened 3 years ago

jk78346 commented 3 years ago

I used the pre-trained checkpoint of spinenet49 model to export to saved model using the following command

python3 ~/tpu/models/official/detection/export_saved_model.py --export_dir=~/tpu/tools/datasets/model/spinenet49-retinanet_saved/ --checkpoint_path=~/tpu/tools/datasets/model/spinenet49-retinanet/ --batch_size=1 --input_type="image_bytes" --input_name="input" --input_image_size=640, 640

While I got this message:

2021-05-10 21:46:02.751063: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Traceback (most recent call last):
  File "/nfshome/khsu037/tpu/models/official/detection/export_saved_model.py", line 35, in <module>
    from serving import detection
  File "/nfshome/khsu037/tpu/models/official/detection/serving/detection.py", line 28, in <module>
    from modeling import factory
  File "/nfshome/khsu037/tpu/models/official/detection/modeling/factory.py", line 17, in <module>
    from modeling import classification_model
  File "/nfshome/khsu037/tpu/models/official/detection/modeling/classification_model.py", line 26, in <module>
    from modeling.architecture import factory
  File "/nfshome/khsu037/tpu/models/official/detection/modeling/architecture/factory.py", line 24, in <module>
    from modeling.architecture import efficientnet
  File "/nfshome/khsu037/tpu/models/official/detection/modeling/architecture/efficientnet.py", line 25, in <module>
    from official.efficientnet import efficientnet_builder
  File "/nfshome/khsu037/tpu/models/official/efficientnet/efficientnet_builder.py", line 29, in <module>
    import efficientnet_model
ModuleNotFoundError: No module named 'efficientnet_model'

Right before seen this message I also saw yaml modules is missing, I installed it by

$ python3 -m pip install pyyaml

I'm not sure if I miss any step or something complete here?

gagika commented 3 years ago

Looks like efficientnet_builder.py has import efficientnet_model which might not be in your PYTHONPATH. Try adding it to the python path:

export PYTHONPATH=$PYTHONPATH:/path/to/models/official/efficientnet
jk78346 commented 3 years ago

It works, thank you. [sovled]But there is another error message coming out after export in the following:

2021-05-11 15:39:50.649001: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
/usr/local/lib/python3.7/dist-packages/absl/flags/_validators.py:356: UserWarning: Flag --model has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line!
  'command line!' % flag_name)
INFO:tensorflow:model_params is:
 {'model_dir': '', 'use_tpu': False, 'isolate_session_state': False, 'architecture': {'min_level': 3, 'max_level': 7, 'use_bfloat16': False, 'space_to_depth_block_size': 1, 'pre_parser': None, 'num_classes': 91, 'parser': 'retinanet_parser', 'backbone': 'resnet', 'multilevel_features': 'fpn'}, 'train': {'iterations_per_loop': 100, 'train_batch_size': 64, 'total_steps': 22500, 'num_cores_per_replica': None, 'input_partition_dims': None, 'optimizer': {'type': 'momentum', 'momentum': 0.9}, 'learning_rate': {'type': 'step', 'warmup_learning_rate': 0.0067, 'warmup_steps': 500, 'init_learning_rate': 0.08, 'learning_rate_levels': [0.008, 0.0008], 'learning_rate_steps': [15000, 20000]}, 'checkpoint': {'path': '', 'prefix': '', 'skip_variables_regex': ''}, 'frozen_variable_prefix': None, 'train_file_pattern': '', 'train_dataset_type': 'tfrecord', 'transpose_input': True, 'regularization_variable_regex': '.*(kernel|weight):0$', 'l2_weight_decay': 0.0001, 'gradient_clip_norm': 0.0, 'space_to_depth_block_size': 1}, 'eval': {'eval_batch_size': 8, 'eval_samples': None, 'min_eval_interval': 180, 'eval_timeout': None, 'num_steps_per_eval': 1000, 'eval_file_pattern': '', 'eval_dataset_type': 'tfrecord', 'skip_eval_loss': False, 'type': 'box', 'use_json_file': True, 'val_json_file': '', 'per_category_metrics': False}, 'predict': {'predict_batch_size': 8}, 'batch_norm_activation': {'batch_norm_momentum': 0.997, 'batch_norm_epsilon': 0.0001, 'batch_norm_trainable': True, 'use_sync_bn': False, 'activation': 'relu'}, 'dropblock': {'dropblock_keep_prob': None, 'dropblock_size': None}, 'resnet': {'resnet_depth': 50, 'init_drop_connect_rate': None}, 'spinenet': {'model_id': '49', 'init_drop_connect_rate': None, 'use_native_resize_op': False}, 'spinenet_mbconv': {'model_id': '49', 'se_ratio': 0.2, 'init_drop_connect_rate': None, 'use_native_resize_op': False}, 'enable_summary': False, 'anchor': {'num_scales': 3, 'aspect_ratios': [1.0, 2.0, 0.5], 'anchor_size': 4.0}, 'fpn': {'fpn_feat_dims': 256, 'use_separable_conv': False, 'use_batch_norm': True}, 'nasfpn': {'fpn_feat_dims': 256, 'num_repeats': 5, 'use_separable_conv': False, 'init_drop_connect_rate': None, 'block_fn': 'conv', 'activation': None, 'use_sum_for_combination': False}, 'postprocess': {'apply_nms': True, 'use_batched_nms': False, 'nms_version': 'v1', 'max_total_size': 100, 'nms_iou_threshold': 0.5, 'score_threshold': 0.05, 'pre_nms_num_boxes': 5000}, 'type': 'retinanet', 'retinanet_parser': {'output_size': [640, 640], 'match_threshold': 0.5, 'unmatched_threshold': 0.5, 'aug_rand_hflip': True, 'aug_scale_min': 1.0, 'aug_scale_max': 1.0, 'aug_policy': '', 'skip_crowd_during_training': True, 'max_num_instances': 100, 'regenerate_source_id': False}, 'retinanet_head': {'anchors_per_location': None, 'num_convs': 4, 'num_filters': 256, 'use_separable_conv': False, 'use_batch_norm': True}, 'retinanet_loss': {'focal_loss_alpha': 0.25, 'focal_loss_gamma': 1.5, 'huber_loss_delta': 0.1, 'box_loss_weight': 50, 'normalizer_momentum': 0.0}, 'mode': 'infer', 'transpose_input': False}
I0511 15:39:51.481986 139932890318592 export_saved_model.py:122] model_params is:
 {'model_dir': '', 'use_tpu': False, 'isolate_session_state': False, 'architecture': {'min_level': 3, 'max_level': 7, 'use_bfloat16': False, 'space_to_depth_block_size': 1, 'pre_parser': None, 'num_classes': 91, 'parser': 'retinanet_parser', 'backbone': 'resnet', 'multilevel_features': 'fpn'}, 'train': {'iterations_per_loop': 100, 'train_batch_size': 64, 'total_steps': 22500, 'num_cores_per_replica': None, 'input_partition_dims': None, 'optimizer': {'type': 'momentum', 'momentum': 0.9}, 'learning_rate': {'type': 'step', 'warmup_learning_rate': 0.0067, 'warmup_steps': 500, 'init_learning_rate': 0.08, 'learning_rate_levels': [0.008, 0.0008], 'learning_rate_steps': [15000, 20000]}, 'checkpoint': {'path': '', 'prefix': '', 'skip_variables_regex': ''}, 'frozen_variable_prefix': None, 'train_file_pattern': '', 'train_dataset_type': 'tfrecord', 'transpose_input': True, 'regularization_variable_regex': '.*(kernel|weight):0$', 'l2_weight_decay': 0.0001, 'gradient_clip_norm': 0.0, 'space_to_depth_block_size': 1}, 'eval': {'eval_batch_size': 8, 'eval_samples': None, 'min_eval_interval': 180, 'eval_timeout': None, 'num_steps_per_eval': 1000, 'eval_file_pattern': '', 'eval_dataset_type': 'tfrecord', 'skip_eval_loss': False, 'type': 'box', 'use_json_file': True, 'val_json_file': '', 'per_category_metrics': False}, 'predict': {'predict_batch_size': 8}, 'batch_norm_activation': {'batch_norm_momentum': 0.997, 'batch_norm_epsilon': 0.0001, 'batch_norm_trainable': True, 'use_sync_bn': False, 'activation': 'relu'}, 'dropblock': {'dropblock_keep_prob': None, 'dropblock_size': None}, 'resnet': {'resnet_depth': 50, 'init_drop_connect_rate': None}, 'spinenet': {'model_id': '49', 'init_drop_connect_rate': None, 'use_native_resize_op': False}, 'spinenet_mbconv': {'model_id': '49', 'se_ratio': 0.2, 'init_drop_connect_rate': None, 'use_native_resize_op': False}, 'enable_summary': False, 'anchor': {'num_scales': 3, 'aspect_ratios': [1.0, 2.0, 0.5], 'anchor_size': 4.0}, 'fpn': {'fpn_feat_dims': 256, 'use_separable_conv': False, 'use_batch_norm': True}, 'nasfpn': {'fpn_feat_dims': 256, 'num_repeats': 5, 'use_separable_conv': False, 'init_drop_connect_rate': None, 'block_fn': 'conv', 'activation': None, 'use_sum_for_combination': False}, 'postprocess': {'apply_nms': True, 'use_batched_nms': False, 'nms_version': 'v1', 'max_total_size': 100, 'nms_iou_threshold': 0.5, 'score_threshold': 0.05, 'pre_nms_num_boxes': 5000}, 'type': 'retinanet', 'retinanet_parser': {'output_size': [640, 640], 'match_threshold': 0.5, 'unmatched_threshold': 0.5, 'aug_rand_hflip': True, 'aug_scale_min': 1.0, 'aug_scale_max': 1.0, 'aug_policy': '', 'skip_crowd_during_training': True, 'max_num_instances': 100, 'regenerate_source_id': False}, 'retinanet_head': {'anchors_per_location': None, 'num_convs': 4, 'num_filters': 256, 'use_separable_conv': False, 'use_batch_norm': True}, 'retinanet_loss': {'focal_loss_alpha': 0.25, 'focal_loss_gamma': 1.5, 'huber_loss_delta': 0.1, 'box_loss_weight': 50, 'normalizer_momentum': 0.0}, 'mode': 'infer', 'transpose_input': False}
 - Setting up TPUEstimator...
WARNING:tensorflow:From /nfshome/khsu037/tpu/models/official/detection/export_saved_model.py:147: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.

W0511 15:39:51.482246 139932890318592 module_wrapper.py:138] From /nfshome/khsu037/tpu/models/official/detection/export_saved_model.py:147: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.

WARNING:tensorflow:From /nfshome/khsu037/tpu/models/official/detection/export_saved_model.py:150: The name tf.estimator.tpu.RunConfig is deprecated. Please use tf.compat.v1.estimator.tpu.RunConfig instead.

W0511 15:39:51.482312 139932890318592 module_wrapper.py:138] From /nfshome/khsu037/tpu/models/official/detection/export_saved_model.py:150: The name tf.estimator.tpu.RunConfig is deprecated. Please use tf.compat.v1.estimator.tpu.RunConfig instead.

WARNING:tensorflow:From /nfshome/khsu037/tpu/models/official/detection/export_saved_model.py:151: The name tf.estimator.tpu.TPUConfig is deprecated. Please use tf.compat.v1.estimator.tpu.TPUConfig instead.

W0511 15:39:51.482371 139932890318592 module_wrapper.py:138] From /nfshome/khsu037/tpu/models/official/detection/export_saved_model.py:151: The name tf.estimator.tpu.TPUConfig is deprecated. Please use tf.compat.v1.estimator.tpu.TPUConfig instead.

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpa0fda4xk
W0511 15:39:51.482929 139932890318592 estimator.py:1847] Using temporary folder as model directory: /tmp/tmpa0fda4xk
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpa0fda4xk', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': 'local', '_evaluation_master': 'local', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1, experimental_allow_per_host_v2_parallel_get_next=False, experimental_feed_hook=None), '_cluster': None}
I0511 15:39:51.483158 139932890318592 estimator.py:191] Using config: {'_model_dir': '/tmp/tmpa0fda4xk', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': 'local', '_evaluation_master': 'local', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1, experimental_allow_per_host_v2_parallel_get_next=False, experimental_feed_hook=None), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I0511 15:39:51.483376 139932890318592 tpu_context.py:271] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W0511 15:39:51.483447 139932890318592 tpu_context.py:273] eval_on_tpu ignored because use_tpu is False.
 - Exporting the model...
INFO:tensorflow:Creating base dir: ~/tpu/tools/datasets/model/savedModels
I0511 15:39:51.484124 139932890318592 export_saved_model.py:166] Creating base dir: ~/tpu/tools/datasets/model/savedModels
INFO:tensorflow:Calling model_fn.
I0511 15:39:51.511267 139932890318592 estimator.py:1162] Calling model_fn.
INFO:tensorflow:Running infer on CPU/GPU
I0511 15:39:51.511374 139932890318592 tpu_estimator.py:3218] Running infer on CPU/GPU
Traceback (most recent call last):
  File "/nfshome/khsu037/tpu/models/official/detection/export_saved_model.py", line 208, in <module>
    tf.app.run(main)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/nfshome/khsu037/tpu/models/official/detection/export_saved_model.py", line 201, in main
    FLAGS.cast_detection_classes_to_float)
  File "/nfshome/khsu037/tpu/models/official/detection/export_saved_model.py", line 172, in export
    checkpoint_path=checkpoint_path)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 725, in export_saved_model
    strip_default_attrs=True)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 859, in _export_all_saved_models
    strip_default_attrs=strip_default_attrs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2919, in _add_meta_graph_for_mode
    strip_default_attrs=strip_default_attrs))
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 934, in _add_meta_graph_for_mode
    config=self.config)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2962, in _call_model_fn
    config)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1163, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3220, in _model_fn
    features, labels, is_export_mode=is_export_mode)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1729, in call_without_tpu
    return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2072, in _call_model_fn
    estimator_spec = self._model_fn(features=features, **kwargs)
  File "/nfshome/khsu037/tpu/models/official/detection/serving/detection.py", line 230, in _serving_model_fn
    predictions = serving_model_graph(features, model_params)
  File "/nfshome/khsu037/tpu/models/official/detection/serving/detection.py", line 191, in _serving_model_graph
    cast_detection_classes_to_float)
  File "/nfshome/khsu037/tpu/models/official/detection/serving/detection.py", line 95, in build_predictions
    params.anchor.anchor_size, (height, width))
  File "/nfshome/khsu037/tpu/models/official/detection/dataloader/anchor.py", line 68, in __init__
    self.boxes = self._generate_boxes()
  File "/nfshome/khsu037/tpu/models/official/detection/dataloader/anchor.py", line 91, in _generate_boxes
    xv, yv = tf.meshgrid(x, y)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/array_ops.py", line 3552, in meshgrid
    mult_fact = ones(shapes, output_dtype)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/array_ops.py", line 3120, in ones
    output = _constant_if_small(one, shape, dtype, name)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/array_ops.py", line 2804, in _constant_if_small
    if np.prod(shape) < 1000:
  File "<__array_function__ internals>", line 6, in prod
  File "/nfshome/khsu037/.local/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 3031, in prod
    keepdims=keepdims, initial=initial, where=where)
  File "/nfshome/khsu037/.local/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py", line 855, in __array__
    " a NumPy call, which is not supported".format(self.name))
NotImplementedError: Cannot convert a symbolic Tensor (meshgrid/Size_1:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported

[Edit] I think this is because of too new numpy version as mentioned in this post

pip install numpy==1.19.5