tensorflow / serving

A flexible, high-performance serving system for machine learning models
https://www.tensorflow.org/serving
Apache License 2.0
6.13k stars 2.19k forks source link

Building TF serving from source on Jetson Xavier #1277

Closed ewirbel closed 2 years ago

ewirbel commented 5 years ago

Feature Request

Describe the problem the feature is intended to solve

I am trying to get TF serving 1.13 with GPU support (server side api) running on a Jetson AGX Xavier board. I have managed to use the Tensorflow pip wheel provided by NVidia, and the to install the client side python package, but I need the model server (to run remote inferences on the board).

Describe the solution

Provide docker images for aarch64, with GPU support, or provide a toolchain for aarch64.

Describe alternatives you've considered

I have unsuccessfully tried to build tensorflow serving from source:

bazel version WARNING: The following rc files are no longer being read, please transfer their contents or import their path into one of the standard rc files: .bazelrc WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". INFO: Invocation ID: 7aaba226-9820-41a1-90d8-685da07742f5 Build label: 0.20.0- (@non-git) Build target: bazel-out/aarch64-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar Build time: Wed Mar 13 14:49:38 2019 (1552488578) Build timestamp: 1552488578 Build timestamp as int: 1552488578

When running the bazel build command I get the following error

bazel build --verbose_failures -c opt --config=cuda --config=nativeopt --copt="-fPIC" tensorflow_serving/model_servers:tensorflow_model_server INFO: Invocation ID: 584c76e9-26c5-4440-9927-338e2424fbf8 ERROR: No toolchain found for cpu 'aarch64'. Valid toolchains are: [local_linux: --cpu='local' --compiler='compiler', local_darwin: --cpu='darwin' --compiler='compiler', local_windows: --cpu='x64_windows' --compiler='msvc-cl',] INFO: Elapsed time: 0.322sINFO: 0 processes. FAILED: Build did NOT complete successfully (0 packages loaded)

Additional context

I managed to build TF serving 1.12 with GPU support and bazel 0.15.2

netfs commented 5 years ago

TF Serving does not have support for aarch64 architecture (in BUILD system, and i suspect the code might need changes too).

TF (core) afaik has support for aarch64 for the lite ecosystem: https://www.tensorflow.org/lite/guide/build_arm64

Happy to accept patches to add aarch64 support to TF code base.

netfs commented 5 years ago

There is TF SIG Build too, to see what others are doing regarding aarch64 builds. Try asking on their mailing list.

helmut-hoffer-von-ankershoffen commented 4 years ago

TensorFlow Serving builds quite nicely on Jetson devices nowadays - have a look at https://github.com/helmuthva/jetson/tree/master/workflow/deploy/tensorflow-serving-base/src or https://github.com/helmuthva/jetson for the bigger picture of this project.

helmut-hoffer-von-ankershoffen commented 4 years ago

Docker images to get TensorFlow Serving up and running on Jetson Nano and Jetson AGX Xavier devices are now published on DockerHub - see https://hub.docker.com/u/helmuthva

To allow GPU access from inside the container the following devices have to be mounted when running the container:

deaffella commented 4 years ago

Docker images to get TensorFlow Serving up and running on Jetson Nano and Jetson AGX Xavier devices are now published on DockerHub - see https://hub.docker.com/u/helmuthva

To allow GPU access from inside the container the following devices have to be mounted when running the container:

  • /dev/nvhost-ctrl
  • /dev/nvhost-ctrl-gpu
  • /dev/nvhost-prof-gpu
  • /dev/nvmap
  • /dev/nvhost-gpu
  • /dev/nvhost-as-gpu

Hi! I want to use tensorflow serving on my Jetson TX2. I've successfully pulled docker image and tried to create a new container with all these devices mounting. When the container starts, the RAM is filled by more than 90% and I get messages in the container logs about the lack of RAM. The query execution time fluctuates about 5 seconds, which is a lot for me. When using tensorflow serving on a weaker computer without a GPU, I get a runtime of about 0.5-1 second. What am I doing wrong? Please help me

omartin2010 commented 4 years ago

Did you figure this out, @deaffella ? I'm planning to do that if I can, to make it simpler to serve my a model on my tx2. Basically the issue I have with the images Helmut is referring to above are too large for my device (which already has some things on it and the images uses 6+GB). Trying to build with bazel, I get this output :

RUN bazel build     --color=yes     --curses=yes     --jobs="${JOBS}"     --verbose_failures     --output_filter=DONT_MATCH_ANYTHING     --config=cuda     --config=nativeopt     --config=jetson     --copt="-fPIC"     tensorflow_serving/model_servers:tensorflow_model_server &&     cp /tensorflow-serving/bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server /usr/local/bin/tensorflow_model_server
 ---> Running in c271cf5b58c4
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
ERROR: error loading package '': Encountered error while reading extension file 'third_party/toolchains/preconfig/generate/archives.bzl': no such package '@org_tensorflow//third_party/toolchains/preconfig/generate': type 'repository_ctx' has no method patch()
ERROR: error loading package '': Encountered error while reading extension file 'third_party/toolchains/preconfig/generate/archives.bzl': no such package '@org_tensorflow//third_party/toolchains/preconfig/generate': type 'repository_ctx' has no method patch()
INFO: Elapsed time: 33.719s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
The command '/bin/sh -c bazel build     --color=yes     --curses=yes     --jobs="${JOBS}"     --verbose_failures     --output_filter=DONT_MATCH_ANYTHING     --config=cuda     --config=nativeopt     --config=jetson     --copt="-fPIC"     tensorflow_serving/model_servers:tensorflow_model_server &&     cp /tensorflow-serving/bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server /usr/local/bin/tensorflow_model_server' returned a non-zero code: 1nsorflow; fetching 21s

Should I open a new issue ? Not sure what to take a look at, here.

littlepai commented 4 years ago

Did you figure this out, @deaffella ? I'm planning to do that if I can, to make it simpler to serve my a model on my tx2. Basically the issue I have with the images Helmut is referring to above are too large for my device (which already has some things on it and the images uses 6+GB). Trying to build with bazel, I get this output :

RUN bazel build     --color=yes     --curses=yes     --jobs="${JOBS}"     --verbose_failures     --output_filter=DONT_MATCH_ANYTHING     --config=cuda     --config=nativeopt     --config=jetson     --copt="-fPIC"     tensorflow_serving/model_servers:tensorflow_model_server &&     cp /tensorflow-serving/bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server /usr/local/bin/tensorflow_model_server
 ---> Running in c271cf5b58c4
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
ERROR: error loading package '': Encountered error while reading extension file 'third_party/toolchains/preconfig/generate/archives.bzl': no such package '@org_tensorflow//third_party/toolchains/preconfig/generate': type 'repository_ctx' has no method patch()
ERROR: error loading package '': Encountered error while reading extension file 'third_party/toolchains/preconfig/generate/archives.bzl': no such package '@org_tensorflow//third_party/toolchains/preconfig/generate': type 'repository_ctx' has no method patch()
INFO: Elapsed time: 33.719s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
The command '/bin/sh -c bazel build     --color=yes     --curses=yes     --jobs="${JOBS}"     --verbose_failures     --output_filter=DONT_MATCH_ANYTHING     --config=cuda     --config=nativeopt     --config=jetson     --copt="-fPIC"     tensorflow_serving/model_servers:tensorflow_model_server &&     cp /tensorflow-serving/bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server /usr/local/bin/tensorflow_model_server' returned a non-zero code: 1nsorflow; fetching 21s

Should I open a new issue ? Not sure what to take a look at, here.

We have the same thing Any new developments?

sanatmpa1 commented 2 years ago

@ewirbel,

Can you take a look at this link which contains docker image of TF serving for Jetson Xavier and let us know if it helps? Thanks!

sanatmpa1 commented 2 years ago

@ewirbel,

Closing this issue due to lack of recent activity. Please feel free to reopen the issue with more details if the problem still persists. Thanks!