mandarjoshi90 / coref

BERT for Coreference Resolution
Apache License 2.0
440 stars 92 forks source link

can anyone help me with the gpu configuration? it works well on cpu but when i turn to the model to run on gpu it opens succefully all the related libraries but crashes it some step #95

Open aymen-souid-github opened 2 years ago

aymen-souid-github commented 2 years ago

2022-02-04 13:35:36.159284: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-02-04 13:35:36.159373: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9008 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:06:00.0, compute capability: 8.6) 2022-02-04 13:38:46.444608: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0 2022-02-04 13:39:39.431728: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED 2022-02-04 13:39:39.466299: W tensorflow/core/kernels/queue_base.cc:277] _0_padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed Traceback (most recent call last): File "/home/souid/anaconda3/envs/arabic_coref/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/home/souid/anaconda3/envs/arabic_coref/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/souid/anaconda3/envs/arabic_coref/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(30, 20), b.shape=(20, 3000), m=30, n=3000, k=20 [[{{node width_scores/xw_plus_b/MatMul}}]] [[Sum/_687]] (1) Internal: Blas GEMM launch failed : a.shape=(30, 20), b.shape=(20, 3000), m=30, n=3000, k=20 [[{{node width_scores/xw_plus_b/MatMul}}]] 0 successful operations. 0 derived errors ignored.