localminimum / QANet

A Tensorflow implementation of QANet for machine reading comprehension
MIT License
983 stars 310 forks source link

Memory Issue #23

Closed amandalmia14 closed 6 years ago

amandalmia14 commented 6 years ago

I am using AWS p2.xlarge which has Tesla K80. While training it is still showing memory issue. Why?? It has 11.17 GIGs of memory which displays in my console. Logs - attached. logs.txt

TIA

localminimum commented 6 years ago

Hi @amandalmia14 thanks for your question! From the logs.txt it seems like your machine already didn't have enough space in gpu before the training began.

totalMemory: 11.17GiB freeMemory: 504.12MiB
2018-05-24 08:53:49.386687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-24 08:53:49.696065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-24 08:53:49.696131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0
2018-05-24 08:53:49.696150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N

indicating that there was no memory to grab for tensorflow. Please make sure it is not occupied by some other operations that require gpu memory and try it again.

amandalmia14 commented 6 years ago

Ahhhh!! How could I miss this!! My bad.