kentonl / e2e-coref

End-to-end Neural Coreference Resolution
Apache License 2.0
518 stars 174 forks source link

how to train with one gpu #39

Closed fancyerii closed 5 years ago

fancyerii commented 5 years ago

my machine has only one gpu. I changed experiment.conf as follows:

 two_local_gpus {
   addresses {
     ps = [localhost:2222]
-    worker = [localhost:2223, localhost:2224]
+    worker = [localhost:2223]
   }
-  gpus = [0, 1]
+  gpus = [0]
 }

when I run python train.py best it prints 2018-10-23 12:01:57.495403: E tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_NO_DEVICE

I checked tensorflow with gpu by:

import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

with tf.Session() as sess:
    print (sess.run(c))

2018-10-23 12:05:53.137069: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-10-23 12:05:53.137468: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: name: Quadro P3000 major: 6 minor: 1 memoryClockRate(GHz): 1.215 pciBusID: 0000:01:00.0 totalMemory: 5.94GiB freeMemory: 4.95GiB 2018-10-23 12:05:53.137501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0 2018-10-23 12:05:53.321772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-23 12:05:53.321825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 2018-10-23 12:05:53.321834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N 2018-10-23 12:05:53.322015: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4718 MB memory) -> physical GPU (device: 0, name: Quadro P3000, pci bus id: 0000:01:00.0, compute capability: 6.1) [[22. 28.] [49. 64.]]

what's wrong with it?

fancyerii commented 5 years ago

I installed tensorflow and it works