Closed amandalmia14 closed 6 years ago
Hi @amandalmia14 thanks for your question! From the logs.txt it seems like your machine already didn't have enough space in gpu before the training began.
totalMemory: 11.17GiB freeMemory: 504.12MiB
2018-05-24 08:53:49.386687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-24 08:53:49.696065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-24 08:53:49.696131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-05-24 08:53:49.696150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
indicating that there was no memory to grab for tensorflow. Please make sure it is not occupied by some other operations that require gpu memory and try it again.
Ahhhh!! How could I miss this!! My bad.
I am using AWS p2.xlarge which has Tesla K80. While training it is still showing memory issue. Why?? It has 11.17 GIGs of memory which displays in my console. Logs - attached. logs.txt
TIA