kentonl / e2e-coref

End-to-end Neural Coreference Resolution
Apache License 2.0
518 stars 174 forks source link

gpu ERROR #21

Closed herbertchen1 closed 5 years ago

herbertchen1 commented 6 years ago

i try to use the gpu to training the model ,and meet an error F tensorflow/stream_executor/cuda/cuda_dnn.cc:222] Check failed: s.ok() could not find cudnnCreate in cudnn DSO; dlerror: /home/chenbo/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: cudnnCreate Aborted (core dumped)

CUDA Version 8.0.61 Cudnn version #define CUDNN_MAJOR 6

has the cuda or cudnn specified to run it on GPU? and how to set the GPUs in shell? GPU=0,1 python singleton.py best doesn't work

kentonl commented 5 years ago

This seems like a general problem with tensorflow and GPU usage. Can you verify that running a bare minimum TF session with GPU works?

herbertchen1 commented 5 years ago

Thanks for your reply,I‘ve found the solution by using "export GPU=1"; And Thanks for releasing the code of your work sincerely !!!

herbertchen1 commented 5 years ago

I find the conll-script need py2. And when i use tf 1.8.0, i meet undefined symbol like others, so i change to tf 1.9.0. I didnt find the pre-process in the test-set. If it's unnecessary for the CoNLL 2012?

kentonl commented 5 years ago

Unfortunately, there's not much we can do about the python 3 compatibility of the CoNLL preprocessing script. But it's good know in any case.

So TF 1.8.0 didn't work but TF 1.9.0 did? If so, I'll update requirements.txt to reflect this.

The preprocessing that does not involve the test set (caching ELMo and filtering embeddings) do not affect correctness; they are only there for faster training. At test time, we use the slow version of both of things (computing ELMo on the fly and loading the entire set of GloVe embeddings).

herbertchen1 commented 5 years ago

Yes , I first install tf 1.8.0. While compiling the coref_ops.cc , there is some warning ":0:0: warning: "_GLIBCXX_USE_CXX11_ABI" redefined

:0:0: note: this is the location of the previous definition", which didn't happend when i use tf 1.9.0. finally,tf 1.8.0 broken down with undefined symbol error like #25
kentonl commented 5 years ago

I've updated the required TF version (a24d1070c2b7e50bc71cfeb6881c8abfc870451c)