Closed Vesnica closed 6 years ago
I reproduced the steps you mentioned, loaded half_plus_two as a simple test, and was able to query it successfully with no segfault. Do you always get a segfault or with some specific model? Can you try half_plus_two if you haven't already?
steps:
The linkopts changes you made look right. I also confirmed ldd tensorflow_model_server
returns not a dynamic executable
on my machine.
Export the model to /tmp/half_plus_two
rm /tmp/half_plus_two
bazel run tensorflow_serving/servables/tensorflow/testdata:export_half_plus_two
Start a server
bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --model_base_path=/tmp/half_plus_two
Query it using a custom test client
bazel build tensorflow_serving/model_servers:tensorflow_model_server_test_client
bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server_test_client
Unfortunately, serve half_plus_two gives the same result:
ubuntu@7ab04e2b2ec6:~/serving$ bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --model_base_path=/tmp/half_plus_two/
I tensorflow_serving/model_servers/main.cc:122] Building single TensorFlow model file config: model_name: default model_base_path: /tmp/half_plus_two/
I tensorflow_serving/core/basic_manager.cc:190] Using InlineExecutor for BasicManager.
I tensorflow_serving/model_servers/server_core.cc:128] Adding models to manager.
I tensorflow_serving/model_servers/server_core.cc:77] Adding model: default
I tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:252] File-system polling update: Servable:{name: default version: 123}; Servable path: /tmp/half_plus_two/00000123; Polling frequency: 30
I tensorflow_serving/core/loader_harness.cc:70] Approving load for servable version {name: default version: 123}
I tensorflow_serving/core/loader_harness.cc:85] Loading servable version {name: default version: 123}
I external/org_tensorflow/tensorflow/contrib/session_bundle/session_bundle.cc:142] Attempting to load a SessionBundle from: /tmp/half_plus_two/00000123
I external/org_tensorflow/tensorflow/contrib/session_bundle/session_bundle.cc:143] Using RunOptions:
I external/org_tensorflow/tensorflow/contrib/session_bundle/session_bundle.cc:106] Running restore op for SessionBundle
I external/org_tensorflow/tensorflow/contrib/session_bundle/session_bundle.cc:218] Done loading SessionBundle. Took 0 seconds.
I tensorflow_serving/core/loader_harness.cc:118] Successfully loaded servable version {name: default version: 123}
Segmentation fault (core dumped)
What bothers me most is the file size of static linked binary is even smaller than the dynamic linked binary (223M vs 224M), which I thought should never have happened.
I'll pull the master branch today and try again, with hope that commits in this two weeks may can fix this problem.
Sorry for the delay, but I have some good news: binary compiled from current repo(https://github.com/tensorflow/serving/commit/e9d01c00aba8f843a20afb9117c1347a9f4b3b2f) just works!
It's size grows to 259M from 224M, but it can running on alpine linux without any dependencies, that allows me to cut the deploy package size by half! :tada:
Thanks for the help!
Hi @Vesnica do you have a docker file for your Alpine Tensorflow Serving solution.
Thanks
I'll make a latest one, stay tuned.
Bad news: current statically linked tensorflow_model_server produce seg fault again, while dynamically linked(default setting) binary works fine.
Reproduce steps:
Logs is shown below:
ubuntu@NIV-AI:~/docker/serving$ bazel-bin/tensorflow_serving/tensorflow_model_server --model_base_path="/tmp/half_plus_two/"
I tensorflow_serving/model_servers/main.cc:118] Building single TensorFlow model file config: model_name: default model_base_path: /tmp/half_plus_two/ model_version_policy: 0
I tensorflow_serving/model_servers/server_core.cc:337] Adding/updating models.
I tensorflow_serving/model_servers/server_core.cc:383] (Re-)adding model: default
I tensorflow_serving/core/basic_manager.cc:693] Successfully reserved resources to load servable {name: default version: 123}
I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: default version: 123}
I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: default version: 123}
I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:291] Attempting to up-convert SessionBundle to SavedModelBundle in bundle-shim from: /tmp/half_plus_two/00000123
I external/org_tensorflow/tensorflow/contrib/session_bundle/session_bundle.cc:161] Attempting to load a SessionBundle from: /tmp/half_plus_two/00000123
I external/org_tensorflow/tensorflow/contrib/session_bundle/session_bundle.cc:162] Using RunOptions:
W external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I external/org_tensorflow/tensorflow/contrib/session_bundle/session_bundle.cc:135] Running restore op for SessionBundle: save/restore_all, save/Const:0
Segmentation fault
My computer spec (produced by lshw): spec.zip
uname -a:
Linux b0f38963d70a 4.4.0-46-generic #67-Ubuntu SMP Thu Oct 20 15:05:12 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Is this still a problem?
Hi, community
I'm trying to build a fully static linked tensorflow_model_server, so it can running on alpine linux which didn't have glibc and other necessary libraries.
I checked bazel's doc(https://www.bazel.io/versions/master/docs/be/c-cpp.html), and found that
linkopts = ["-static"]
can produce a fully static binary. so I modify the serving/tensorflow_serving/model_servers/BUILD as following:Build process completed successfully, and
ldd tensorflow_model_server
produce this:not a dynamic executable
But it seg fault when trying to serve a model:
gdb information:
Is there anything I'm done wrong or missing some critical steps? Any help is greatly appreciated.