Training gets killed - Githubissues

happylun / SketchModeling

Source code for the Sketch Modeling project: reconstruct a 3D shape from line drawing sketches.

https://people.cs.umass.edu/~zlun/papers/SketchModeling/

GNU General Public License v3.0

142 stars 49 forks source link

Training gets killed #5

Open HenriMir opened 6 years ago

HenriMir commented 6 years ago

Hi, I have a problem re-training the network, after few iterations, the process get killed. I've tried many fixes found on the net (for example: https://github.com/tensorflow/tensorflow/issues/5289) like decreasing batch_size (which is already at 2) but nothing solves it, I still have the 'killed' problem.

I train the network on a Tesla K40c with 12GB of RAM with python 3.5, NumPy 1.12.1,SciPy 0.19.0 and TensorFlow 1.0.1, the asked requirements.

Does anybody have an idea to solve this?

Thank you

happylun commented 6 years ago

I also have no idea. The environment setting should be fine and I think 12GB RAM should be enough. Do you have any error log or screenshot for the error?

HenriMir commented 6 years ago

capture_killed_problem

That does not raise an error, it just gets killed, as you can see on the last line of my terminal's screenshot. This time, it took 40 minutes to get killed after 18 iterations. (I have a 16 Go CPU by the way)

HenriMir commented 6 years ago

[solved] Mea Culpa, in fact, I was on a docker and I did not know it was automatically restraining my CPU to 1Go, I solved this 'killed' problem by adding: " --memory="16g" " to the docker command to have 16Go on the CPU