strubell / LISA

Linguistically-Informed Self-Attention implemented in TensorFlow
Apache License 2.0
201 stars 27 forks source link

Question about Zero Volatile GPU-Util #5

Closed SannyZhou closed 5 years ago

SannyZhou commented 5 years ago

Hello, I am trying to train and evaluate your LISA model on CoNLL dataset. While trying to train the model on a GPU, I use the cmd as CUDA_VISIBLE_DEVICES=0 bin/evaluate.sh config/conll05-lisa.conf --save_dir model. However, it seems that nothing works on the GPU. The nvidia-smi shows volatile GPU-util is zero. How to make best use of GPU for TensorFlow Estimators? Do you have any ideas about the reason of this problem?

patverga commented 5 years ago

First thing to check is that you've installed tensorflow with gpu support. The default tensorflow package is cpu only. pip3 install --user tensorflow-gpu

SannyZhou commented 5 years ago

The package is tensorflow-gpu 1.9.0. @patverga

strubell commented 5 years ago

Does tensorflow output a line like:

2019-01-22 12:22:22.434234: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1098] Created TensorFlow
device (/job:localhost/replica:0/task:0/device:GPU:0 with 11428 MB memory)
-> physical GPU (device: 0, name: GeForce GTX TITAN X, pci bus id:
0000:82:00.0, compute capability: 5.2)

If so, then it's using GPU. If not, then you likely have some kind of configuration issue.

I would expect GPU usage to fluctuate a lot during evaluation, and in fact for most of the time to be spent on CPU since the code calls the official CoNLL evaluation scripts (perl). Currently I believe evaluation uses the same batch size as training, but you could increase it depending on your GPU's memory to make better use of the GPU.

The code currently doesn't have a "predict" mode, which simply outputs predictions for sentences without evaluating. This may be more the functionality you desire, and I'm happy to accept pull requests :)

SannyZhou commented 5 years ago

Does tensorflow output a line like:

2019-01-22 12:22:22.434234: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1098] Created TensorFlow
device (/job:localhost/replica:0/task:0/device:GPU:0 with 11428 MB memory)
-> physical GPU (device: 0, name: GeForce GTX TITAN X, pci bus id:
0000:82:00.0, compute capability: 5.2)

If so, then it's using GPU. If not, then you likely have some kind of configuration issue.

I would expect GPU usage to fluctuate a lot during evaluation, and in fact for most of the time to be spent on CPU since the code calls the official CoNLL evaluation scripts (perl). Currently I believe evaluation uses the same batch size as training, but you could increase it depending on your GPU's memory to make better use of the GPU.

The code currently doesn't have a "predict" mode, which simply outputs predictions for sentences without evaluating. This may be more the functionality you desire, and I'm happy to accept pull requests :)

Thanks for your patient answer. I suddenly found that I set the parameter of debug as 1, which caused the high frequency of evaluation for validation and the low GPU usage.

strubell commented 5 years ago

Great, happy to hear you solved it!

On Sat, Feb 2, 2019 at 6:10 AM Jie Zhou notifications@github.com wrote:

Closed #5 https://github.com/strubell/LISA/issues/5.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/strubell/LISA/issues/5#event-2113794643, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHZt0rdXfpt0LCZjDXWzXQEAVPgcRirks5vJXINgaJpZM4aM4Ig .