A GPU server sometimes have warning 'no GPU available'

linvis commented 5 years ago

Prerequisites

Please fill in by replacing [ ] with [x].

[x] Are you running the latest bert-as-service? 1.9.3
[x] Did you follow the installation and the usage instructions in README.md?
[x] Did you check the FAQ list in README.md?
[x] Did you perform a cursory search on existing issues?

System information

Some of this information can be collected via this script.

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): debian 9
TensorFlow installed from (source or binary): conda
TensorFlow version:
Python version: 3.7.3
bert-as-service version: 1.9.3
GPU model and memory: K80, 11G
CPU model and memory: 32G

Description

Please replace YOUR_SERVER_ARGS and YOUR_CLIENT_ARGS accordingly. You can also write your own description for reproducing the issue.

I'm using this command to start the server:

bert-serving-start -model_dir /home/lin/wspace/chinese_L-12_H-768_A-12 -device_map 0

and with warning: W:VENTILATOR:[__i:_ge:246]:no GPU available, fall back to CPU I feel strange, because I use google cloud server with a GPU K80, and it works well with tensorflow.

so I make a try with code

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

It print correct GPU informations.

2019-07-13 07:14:26.226601: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2019-07-13 07:14:26.233099: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz 2019-07-13 07:14:26.233867: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5574007b4770 executing computations on platform Host. Devices: 2019-07-13 07:14:26.233899: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2019-07-13 07:14:28.347670: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-13 07:14:28.348329: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55740088b370 executing computations on platform CUDA. Devices: 2019-07-13 07:14:28.348402: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla K80, Compute Capability 3.7 2019-07-13 07:14:28.348843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 0000:00:04.0 totalMemory: 11.17GiB freeMemory: 11.10GiB 2019-07-13 07:14:28.348880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-07-13 07:14:28.349686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-13 07:14:28.349718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-07-13 07:14:28.349726: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-07-13 07:14:28.349970: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10802 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7) Device mapping: /job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device /job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7 2019-07-13 07:14:28.351310: I tensorflow/core/common_runtime/direct_session.cc:317] Device mapping: /job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device /job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7

And then I start the server again, GPU works

I:VENTILATOR:[__i:_ge:255]:device map:
                worker  0 -> gpu  0
I:WORKER-0:[__i:_ru:530]:use device gpu: 0, load graph from /tmp/tmpc4t03o8s

but if I don't run the code

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

it will still show 'no GPU available'.

and calling the server via:

bc = BertClient(YOUR_CLIENT_ARGS)
bc.encode()

Then this issue shows up:

...

z595054650 commented 5 years ago

I have the same problem. How did you solve it?

linvis commented 5 years ago

I had not find the root reason. So, I have to start a new terminal with running

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

after that, run bert-as-service, and GPU works well.

Maybe you can have a try.

ghost commented 5 years ago

I'm getting the same error.

jina-ai / clip-as-service

A GPU server sometimes have warning 'no GPU available' #413

Description