Closed salanki closed 4 years ago
what the output of this command?
docker run --gpus all --rm nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi
# docker run --gpus all --rm nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi
Fri Jun 5 04:13:09 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:1F:00.0 Off | 0 |
| N/A 61C P0 205W / 201W | 4661MiB / 16160MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:20:00.0 Off | 0 |
| N/A 59C P0 196W / 201W | 4642MiB / 16160MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... On | 00000000:65:00.0 Off | 0 |
| N/A 57C P0 198W / 201W | 4642MiB / 16160MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... On | 00000000:66:00.0 Off | 0 |
| N/A 58C P0 200W / 201W | 4642MiB / 16160MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2... On | 00000000:B6:00.0 Off | 0 |
| N/A 58C P0 209W / 201W | 4642MiB / 16160MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2... On | 00000000:B7:00.0 Off | 0 |
| N/A 62C P0 200W / 201W | 4642MiB / 16160MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... On | 00000000:DF:00.0 Off | 0 |
| N/A 61C P0 200W / 201W | 4642MiB / 16160MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... On | 00000000:E0:00.0 Off | 0 |
| N/A 69C P0 198W / 201W | 4642MiB / 16160MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Thanks for filing the issue. There's a missing package in the release gpu docker image. We will push out the updated release docker image tomorrow. The nightly build today should also include the fix (74ea413db4407c1affe2b9fa69dc53ecdba61fa6).
Thank you!
@nrobeR Thanks for the quick fix! I've compiled a GPU docker image from the master branch, and I can confirm the libnvrtc
error is now gone. However, the loading process became extremely slow, it took ~5 minutes to load a model which used to load instantly with 2.2.0-rc2-gpu
:
2020-06-05 14:32:36.670648: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1822] Adding visible gpu devices: 0
# ...
2020-06-05 14:37:03.578352: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:364] SavedModel load for tags { serve }; Status: success: OK. Took 267111304 microseconds.
Heres the full log:
$ docker run -it --rm --gpus all -p 8501:8501 -v "foobar:/models/foobar" -e MODEL_NAME=foobar peakji/tensorflow-serving-gpu:2.2.0
2020-06-05 14:32:36.366294: I tensorflow_serving/model_servers/server.cc:87] Building single TensorFlow model file config: model_name: foobar model_base_path: /models/foobar
2020-06-05 14:32:36.366403: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2020-06-05 14:32:36.366412: I tensorflow_serving/model_servers/server_core.cc:575] (Re-)adding model: foobar
2020-06-05 14:32:36.466974: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: foobar version: 1}
2020-06-05 14:32:36.467012: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: foobar version: 1}
2020-06-05 14:32:36.467020: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: foobar version: 1}
2020-06-05 14:32:36.467047: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/foobar/1
2020-06-05 14:32:36.653636: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-06-05 14:32:36.653664: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:295] Reading SavedModel debug info (if present) from: /models/foobar/1
2020-06-05 14:32:36.653736: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-06-05 14:32:36.654665: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-06-05 14:32:36.669708: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-05 14:32:36.669972: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1680] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-06-05 14:32:36.669979: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-06-05 14:32:36.670005: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-05 14:32:36.670412: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-05 14:32:36.670648: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1822] Adding visible gpu devices: 0
2020-06-05 14:37:01.197928: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1221] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-05 14:37:01.197947: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1227] 0
2020-06-05 14:37:01.197951: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1240] 0: N
2020-06-05 14:37:01.198020: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-05 14:37:01.198399: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-05 14:37:01.198788: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-05 14:37:01.199045: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1366] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6381 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-06-05 14:37:01.608828: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:234] Restoring SavedModel bundle.
2020-06-05 14:37:02.740164: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:183] Running initialization op on SavedModel bundle at path: /models/foobar/1
2020-06-05 14:37:03.578352: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:364] SavedModel load for tags { serve }; Status: success: OK. Took 267111304 microseconds.
2020-06-05 14:37:03.711713: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:71] Starting to read warmup data for model at /models/foobar/1/assets.extra/tf_serving_warmup_requests with model-warmup-options
2020-06-05 14:37:11.730410: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:118] Finished reading warmup data for model at /models/foobar/1/assets.extra/tf_serving_warmup_requests. Number of warmup records read: 1. Elapsed time (microseconds): 8020527.
2020-06-05 14:37:11.730543: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: foobar version: 1}
2020-06-05 14:37:11.732092: I tensorflow_serving/model_servers/server.cc:366] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
2020-06-05 14:37:11.733166: I tensorflow_serving/model_servers/server.cc:386] Exporting HTTP/REST API at:localhost:8501 ...
@peakji could you please provide the model file or link to it so that I could try reproduce? Thanks
It happens with or without any model. It errors as soon as the TF binary is loaded as the linked library does not exist.
Thanks @salanki, I understand that and I've verified that's fixed in the latest gpu-nightly.
I was referring to the model loading time latency regression issue that @peakji was encountering.
@reedwm Can you PTAL? Thanks!
@peakji we did a test on latest-devel-gpu vs 2.1.0-devel-gpu on half_plus_two model, the loading time are both ~600ms.
Could you share how you built the docker image and/or try with the latest-devel-gpu and see if that has the same regression?
@salanki the updated tensorflow/serving:2.2.0-gpu is out.
@nrobeR: Could you share how you built the docker image and/or ...
I followed this guide to build from Dockerfile.devel-gpu
. The CPU image built from Dockerfile.devel
works as usual.
... try with the latest-devel-gpu and see if that has the same regression?
It has the same regression. I’ve also tested with the updated tensorflow/serving:2.2.0-gpu
image, the regression is exactly the same, so I guess it’s not related to -march=native
optimizations.
@peakji could you please provide the model file or link to it so that I could try reproduce?
The model I was testing with is proprietary and quite large, I’ll try to make a minimal reproducible example.
@peakji we did a test on latest-devel-gpu vs 2.1.0-devel-gpu on half_plus_two model, the loading time are both ~600ms.
My model contains some recurrent layers, the regression might be caused by changes in CUDNN.
Thanks @peakji. Having a minimal reproducible example would be great.
I'm going to close this one as the original issue about the 2.2.0 gpu docker image config issue has been resolved. Keep two problems in the same issue is a bit confusing to others. Do you mind creating a separate one for this? Thanks
I'm going to close this one as the original issue about the 2.2.0 gpu docker image config issue has been resolved.
@nrobeR 👍
Do you mind creating a separate one for this?
Actually there's already one: https://github.com/tensorflow/serving/issues/1663
Thanks!
System information
CUDA 10.2
Describe the problem
The non devel version of the 2.2.0-gpu image as well as nightly-gpu does not start.
tensorflow/serving:2.2.0-rc2-gpu
works fine, as well astensorflow/serving:2.2.0-devel-gpu
. The latter includes NVRTC and the former does not depend on it.Exact Steps to Reproduce
docker run --gpus all --rm -it tensorflow/serving:2.2.0-gpu