2.20-gpu Docker image is missing NVRTC

salanki commented 4 years ago

System information

CUDA 10.2

Describe the problem

The non devel version of the 2.2.0-gpu image as well as nightly-gpu does not start.

tensorflow_model_server: error while loading shared libraries: libnvrtc.so.10.1: cannot open shared object file: No such file or directory

tensorflow/serving:2.2.0-rc2-gpu works fine, as well as tensorflow/serving:2.2.0-devel-gpu. The latter includes NVRTC and the former does not depend on it.

Exact Steps to Reproduce

docker run --gpus all --rm -it tensorflow/serving:2.2.0-gpu

netfs commented 4 years ago

what the output of this command?

docker run --gpus all --rm nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi

salanki commented 4 years ago

# docker run --gpus all --rm nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi
Fri Jun  5 04:13:09 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:1F:00.0 Off |                    0 |
| N/A   61C    P0   205W / 201W |   4661MiB / 16160MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:20:00.0 Off |                    0 |
| N/A   59C    P0   196W / 201W |   4642MiB / 16160MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000000:65:00.0 Off |                    0 |
| N/A   57C    P0   198W / 201W |   4642MiB / 16160MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  On   | 00000000:66:00.0 Off |                    0 |
| N/A   58C    P0   200W / 201W |   4642MiB / 16160MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla V100-SXM2...  On   | 00000000:B6:00.0 Off |                    0 |
| N/A   58C    P0   209W / 201W |   4642MiB / 16160MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla V100-SXM2...  On   | 00000000:B7:00.0 Off |                    0 |
| N/A   62C    P0   200W / 201W |   4642MiB / 16160MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla V100-SXM2...  On   | 00000000:DF:00.0 Off |                    0 |
| N/A   61C    P0   200W / 201W |   4642MiB / 16160MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla V100-SXM2...  On   | 00000000:E0:00.0 Off |                    0 |
| N/A   69C    P0   198W / 201W |   4642MiB / 16160MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

nrobeR commented 4 years ago

Thanks for filing the issue. There's a missing package in the release gpu docker image. We will push out the updated release docker image tomorrow. The nightly build today should also include the fix (74ea413db4407c1affe2b9fa69dc53ecdba61fa6).

salanki commented 4 years ago

Thank you!

peakji commented 4 years ago

@nrobeR Thanks for the quick fix! I've compiled a GPU docker image from the master branch, and I can confirm the libnvrtc error is now gone. However, the loading process became extremely slow, it took ~5 minutes to load a model which used to load instantly with 2.2.0-rc2-gpu:

2020-06-05 14:32:36.670648: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1822] Adding visible gpu devices: 0
# ...
2020-06-05 14:37:03.578352: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:364] SavedModel load for tags { serve }; Status: success: OK. Took 267111304 microseconds.

Heres the full log:

$ docker run -it --rm --gpus all -p 8501:8501 -v "foobar:/models/foobar" -e MODEL_NAME=foobar peakji/tensorflow-serving-gpu:2.2.0
2020-06-05 14:32:36.366294: I tensorflow_serving/model_servers/server.cc:87] Building single TensorFlow model file config:  model_name: foobar model_base_path: /models/foobar
2020-06-05 14:32:36.366403: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2020-06-05 14:32:36.366412: I tensorflow_serving/model_servers/server_core.cc:575]  (Re-)adding model: foobar
2020-06-05 14:32:36.466974: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: foobar version: 1}
2020-06-05 14:32:36.467012: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: foobar version: 1}
2020-06-05 14:32:36.467020: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: foobar version: 1}
2020-06-05 14:32:36.467047: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/foobar/1
2020-06-05 14:32:36.653636: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-06-05 14:32:36.653664: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:295] Reading SavedModel debug info (if present) from: /models/foobar/1
2020-06-05 14:32:36.653736: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-06-05 14:32:36.654665: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-06-05 14:32:36.669708: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-05 14:32:36.669972: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1680] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-06-05 14:32:36.669979: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-06-05 14:32:36.670005: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-05 14:32:36.670412: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-05 14:32:36.670648: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1822] Adding visible gpu devices: 0
2020-06-05 14:37:01.197928: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1221] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-05 14:37:01.197947: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1227]      0 
2020-06-05 14:37:01.197951: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1240] 0:   N 
2020-06-05 14:37:01.198020: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-05 14:37:01.198399: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-05 14:37:01.198788: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-05 14:37:01.199045: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1366] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6381 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-06-05 14:37:01.608828: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:234] Restoring SavedModel bundle.
2020-06-05 14:37:02.740164: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:183] Running initialization op on SavedModel bundle at path: /models/foobar/1
2020-06-05 14:37:03.578352: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:364] SavedModel load for tags { serve }; Status: success: OK. Took 267111304 microseconds.
2020-06-05 14:37:03.711713: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:71] Starting to read warmup data for model at /models/foobar/1/assets.extra/tf_serving_warmup_requests with model-warmup-options 
2020-06-05 14:37:11.730410: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:118] Finished reading warmup data for model at /models/foobar/1/assets.extra/tf_serving_warmup_requests. Number of warmup records read: 1. Elapsed time (microseconds): 8020527.
2020-06-05 14:37:11.730543: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: foobar version: 1}
2020-06-05 14:37:11.732092: I tensorflow_serving/model_servers/server.cc:366] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
2020-06-05 14:37:11.733166: I tensorflow_serving/model_servers/server.cc:386] Exporting HTTP/REST API at:localhost:8501 ...

nrobeR commented 4 years ago

@peakji could you please provide the model file or link to it so that I could try reproduce? Thanks

salanki commented 4 years ago

It happens with or without any model. It errors as soon as the TF binary is loaded as the linked library does not exist.

nrobeR commented 4 years ago

Thanks @salanki, I understand that and I've verified that's fixed in the latest gpu-nightly.

I was referring to the model loading time latency regression issue that @peakji was encountering.

gowthamkpr commented 4 years ago

@reedwm Can you PTAL? Thanks!

nrobeR commented 4 years ago

@peakji we did a test on latest-devel-gpu vs 2.1.0-devel-gpu on half_plus_two model, the loading time are both ~600ms.

Could you share how you built the docker image and/or try with the latest-devel-gpu and see if that has the same regression?

nrobeR commented 4 years ago

@salanki the updated tensorflow/serving:2.2.0-gpu is out.

peakji commented 4 years ago

@nrobeR: Could you share how you built the docker image and/or ...

I followed this guide to build from Dockerfile.devel-gpu. The CPU image built from Dockerfile.devel works as usual.

... try with the latest-devel-gpu and see if that has the same regression?

It has the same regression. I’ve also tested with the updated tensorflow/serving:2.2.0-gpu image, the regression is exactly the same, so I guess it’s not related to -march=native optimizations.

@peakji could you please provide the model file or link to it so that I could try reproduce?

The model I was testing with is proprietary and quite large, I’ll try to make a minimal reproducible example.

@peakji we did a test on latest-devel-gpu vs 2.1.0-devel-gpu on half_plus_two model, the loading time are both ~600ms.

My model contains some recurrent layers, the regression might be caused by changes in CUDNN.

nrobeR commented 4 years ago

Thanks @peakji. Having a minimal reproducible example would be great.

I'm going to close this one as the original issue about the 2.2.0 gpu docker image config issue has been resolved. Keep two problems in the same issue is a bit confusing to others. Do you mind creating a separate one for this? Thanks

peakji commented 4 years ago

I'm going to close this one as the original issue about the 2.2.0 gpu docker image config issue has been resolved.

@nrobeR 👍

Do you mind creating a separate one for this?

Actually there's already one: https://github.com/tensorflow/serving/issues/1663

nrobeR commented 4 years ago

Thanks!

tensorflow / serving