Thank you sharing your code and post. I have tried your program, while training the model on the 'test.txt' file which is present by default. I see that my GPU memory is completely used. I tried glove embeddings of lesser dimensions (50) too and observe the same.
My setup:
Python : 3.5
Cuda : 9.0
Tensorflow : 1.8
GPU : Tesla K-80 ( it has 11.5 GB memory)
OS : redhat
I tried with cuda-8.0 and Tensorflow : 1.2.0 as well and I observe the same.
Also when I tried to run on my custom input which is a around 100MB in size (with 2 entities labelled, apart from non-entity label) I get the following error:
resourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[108092,100] and type float on /job:localhost/replica:0/task: 0/device:GPU:0 by allocator GPU_0_bfc [[Node: chars/bidirectional_rnn/fw/fw/while/lstm_cell/split = Split[T=DT_FLOAT, num_split=4, _device="/job:localhost/replica:0/task:0/devi ce:GPU:0"](train_step/gradients/Add_3/y, chars/bidirectional_rnn/fw/fw/while/lstm_cell/BiasAdd)]]
Has anyone faced this error before ?
Also do you have any insight on why it takes around 10GB of GPU memory while training on a very small dataset ?
Hello Guillaumegenthial,
Thank you sharing your code and post. I have tried your program, while training the model on the 'test.txt' file which is present by default. I see that my GPU memory is completely used. I tried glove embeddings of lesser dimensions (50) too and observe the same.
My setup: Python : 3.5 Cuda : 9.0 Tensorflow : 1.8 GPU : Tesla K-80 ( it has 11.5 GB memory) OS : redhat
I tried with cuda-8.0 and Tensorflow : 1.2.0 as well and I observe the same.
Also when I tried to run on my custom input which is a around 100MB in size (with 2 entities labelled, apart from non-entity label) I get the following error: resourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[108092,100] and type float on /job:localhost/replica:0/task: 0/device:GPU:0 by allocator GPU_0_bfc [[Node: chars/bidirectional_rnn/fw/fw/while/lstm_cell/split = Split[T=DT_FLOAT, num_split=4, _device="/job:localhost/replica:0/task:0/devi ce:GPU:0"](train_step/gradients/Add_3/y, chars/bidirectional_rnn/fw/fw/while/lstm_cell/BiasAdd)]]
Has anyone faced this error before ? Also do you have any insight on why it takes around 10GB of GPU memory while training on a very small dataset ?
regards, goutham