tobegit3hub / simple_tensorflow_serving

Generic and easy-to-use serving service for machine learning models
https://stfs.readthedocs.io
Apache License 2.0
757 stars 195 forks source link

custom_op (Registered only GPU kernel) failed to load #48

Open jb892 opened 5 years ago

jb892 commented 5 years ago

Hi,

I'm new to tensorflow serving. I'm trying to serving my trained model via simple_tensorflow_serving. However, after I run next line command, it failed to recognize the custom_ops that only registed with GPU kernels.

simple_tensorflow_serving --model_base_path="./models/pointnet2_sem_seg/" --custom_op_paths="./custom_ops/" --session_config='{"log_device_placement": true, "allow_soft_placement": true, "allow_growth": true, "per_process_gpu_memory_fraction": 0.5}'

Result:

2019-04-25 10:26:55 INFO     custom_op_paths: ./custom_ops/
2019-04-25 10:26:55 INFO     debug: False
2019-04-25 10:26:55 INFO     enable_cors: True
2019-04-25 10:26:55 INFO     model_config_file: 
2019-04-25 10:26:55 INFO     host: 0.0.0.0
2019-04-25 10:26:55 INFO     secret_key: secret.key
2019-04-25 10:26:55 INFO     model_name: default
2019-04-25 10:26:55 INFO     port: 8500
2019-04-25 10:26:55 INFO     enable_auth: False
2019-04-25 10:26:55 INFO     model_platform: tensorflow
2019-04-25 10:26:55 INFO     reload_models: False
2019-04-25 10:26:55 INFO     enable_colored_log: False
2019-04-25 10:26:55 INFO     log_level: info
2019-04-25 10:26:55 INFO     auth_username: admin
2019-04-25 10:26:55 INFO     auth_password: admin
2019-04-25 10:26:55 INFO     model_base_path: ./models/pointnet2_sem_seg/
2019-04-25 10:26:55 INFO     gen_client: 
2019-04-25 10:26:55 INFO     bind: 0.0.0.0:8500
2019-04-25 10:26:55 INFO     session_config: {"log_device_placement": true, "allow_soft_placement": true, "allow_growth": true, "per_process_gpu_memory_fraction": 0.5}
2019-04-25 10:26:55 INFO     download_inference_images: True
2019-04-25 10:26:55 INFO     secret_pem: secret.pem
2019-04-25 10:26:55 INFO     enable_ssl: False
2019-04-25 10:26:55 INFO     Load the so file from: ./custom_ops/tf_grouping_so.so
2019-04-25 10:26:55 INFO     Load the so file from: ./custom_ops/tf_interpolate_so.so
2019-04-25 10:26:55 INFO     Load the so file from: ./custom_ops/tf_sampling_so.so
2019-04-25 10:26:55.137247: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
2019-04-25 10:26:55.140876: I tensorflow/core/common_runtime/direct_session.cc:307] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device

2019-04-25 10:26:55 INFO     Put the model version: 1 online, path: ./models/pointnet2_sem_seg/1
INFO:tensorflow:Restoring parameters from ./models/pointnet2_sem_seg/1/variables/variables
2019-04-25 10:26:55 INFO     Restoring parameters from ./models/pointnet2_sem_seg/1/variables/variables
Traceback (most recent call last):
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _run_fn
    self._extend_graph()
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1352, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'FarthestPointSample' with these attrs.  Registered devices: [CPU,XLA_CPU], Registered kernels:
  device='GPU'

     [[{{node layer1/FarthestPointSample}} = FarthestPointSample[npoint=1024, _device="/device:GPU:0"](input)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1546, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'FarthestPointSample' with these attrs.  Registered devices: [CPU,XLA_CPU], Registered kernels:
  device='GPU'

     [[node layer1/FarthestPointSample (defined at /home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py:175)  = FarthestPointSample[npoint=1024, _device="/device:GPU:0"](input)]]

Caused by op 'layer1/FarthestPointSample', defined at:
  File "/home/jake/anaconda3/envs/PointNetPP/bin/simple_tensorflow_serving", line 7, in <module>
    from simple_tensorflow_serving.server import main
  File "<frozen importlib._bootstrap>", line 968, in _find_and_load
  File "<frozen importlib._bootstrap>", line 957, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 697, in exec_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/server.py", line 252, in <module>
    session_config)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 72, in __init__
    self.load_saved_model_version(model_version)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 175, in load_saved_model_version
    session, [tf.saved_model.tag_constants.SERVING], model_file_path)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 197, in load
    return loader.load(sess, tags, import_scope, **saver_kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 350, in load
    **saver_kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 278, in load_graph
    meta_graph_def, import_scope=import_scope, **saver_kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1696, in _import_meta_graph_with_return_elements
    **kwargs))
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/meta_graph.py", line 806, in import_scoped_meta_graph_with_return_elements
    return_elements=return_elements)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 442, in import_graph_def
    _ProcessNewOps(graph)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 234, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3440, in _add_new_tf_operations
    for c_op in c_api_util.new_tf_operations(self)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3440, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3299, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'FarthestPointSample' with these attrs.  Registered devices: [CPU,XLA_CPU], Registered kernels:
  device='GPU'

     [[node layer1/FarthestPointSample (defined at /home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py:175)  = FarthestPointSample[npoint=1024, _device="/device:GPU:0"](input)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jake/anaconda3/envs/PointNetPP/bin/simple_tensorflow_serving", line 7, in <module>
    from simple_tensorflow_serving.server import main
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/server.py", line 252, in <module>
    session_config)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 72, in __init__
    self.load_saved_model_version(model_version)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 175, in load_saved_model_version
    session, [tf.saved_model.tag_constants.SERVING], model_file_path)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 197, in load
    return loader.load(sess, tags, import_scope, **saver_kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 351, in load
    self.restore_variables(sess, saver, import_scope)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 303, in restore_variables
    saver.restore(sess, self._variables_path)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1582, in restore
    err, "a mismatch between the current graph and the graph")
tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

No OpKernel was registered to support Op 'FarthestPointSample' with these attrs.  Registered devices: [CPU,XLA_CPU], Registered kernels:
  device='GPU'

     [[node layer1/FarthestPointSample (defined at /home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py:175)  = FarthestPointSample[npoint=1024, _device="/device:GPU:0"](input)]]

Caused by op 'layer1/FarthestPointSample', defined at:
  File "/home/jake/anaconda3/envs/PointNetPP/bin/simple_tensorflow_serving", line 7, in <module>
    from simple_tensorflow_serving.server import main
  File "<frozen importlib._bootstrap>", line 968, in _find_and_load
  File "<frozen importlib._bootstrap>", line 957, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 697, in exec_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/server.py", line 252, in <module>
    session_config)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 72, in __init__
    self.load_saved_model_version(model_version)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py", line 175, in load_saved_model_version
    session, [tf.saved_model.tag_constants.SERVING], model_file_path)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 197, in load
    return loader.load(sess, tags, import_scope, **saver_kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 350, in load
    **saver_kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/saved_model/loader_impl.py", line 278, in load_graph
    meta_graph_def, import_scope=import_scope, **saver_kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1696, in _import_meta_graph_with_return_elements
    **kwargs))
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/meta_graph.py", line 806, in import_scoped_meta_graph_with_return_elements
    return_elements=return_elements)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 442, in import_graph_def
    _ProcessNewOps(graph)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 234, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3440, in _add_new_tf_operations
    for c_op in c_api_util.new_tf_operations(self)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3440, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3299, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "/home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

No OpKernel was registered to support Op 'FarthestPointSample' with these attrs.  Registered devices: [CPU,XLA_CPU], Registered kernels:
  device='GPU'

     [[node layer1/FarthestPointSample (defined at /home/jake/anaconda3/envs/PointNetPP/lib/python3.5/site-packages/simple_tensorflow_serving/tensorflow_inference_service.py:175)  = FarthestPointSample[npoint=1024, _device="/device:GPU:0"](input)]]

Have anyone has come across this issue? What should I do next?

jb892 commented 5 years ago

It seems that the GPU is not activated during restoring from checkpoint, right?