Open HenriMir opened 6 years ago
I also have no idea. The environment setting should be fine and I think 12GB RAM should be enough. Do you have any error log or screenshot for the error?
That does not raise an error, it just gets killed, as you can see on the last line of my terminal's screenshot. This time, it took 40 minutes to get killed after 18 iterations. (I have a 16 Go CPU by the way)
[solved] Mea Culpa, in fact, I was on a docker and I did not know it was automatically restraining my CPU to 1Go, I solved this 'killed' problem by adding: " --memory="16g" " to the docker command to have 16Go on the CPU
Hi, I have a problem re-training the network, after few iterations, the process get killed. I've tried many fixes found on the net (for example: https://github.com/tensorflow/tensorflow/issues/5289) like decreasing batch_size (which is already at 2) but nothing solves it, I still have the 'killed' problem.
I train the network on a Tesla K40c with 12GB of RAM with python 3.5, NumPy 1.12.1,SciPy 0.19.0 and TensorFlow 1.0.1, the asked requirements.
Does anybody have an idea to solve this?
Thank you