Insufficient computing power?

tranluan / Nonlinear_Face_3DMM

Source code for "Nonlinear 3D Face Morphable Model"

http://cvlab.cse.msu.edu/project-nonlinear-3dmm.html

Apache License 2.0

676 stars 124 forks source link

Insufficient computing power? #44

Closed Adriana618-Love closed 4 years ago

Adriana618-Love commented 4 years ago

Hello, I wish you a good day. The problem I have happens just when I get to the "Train" function, in the update part of G. To be exact on line 487 in model_non_linear_3DMM.py, what happens is the following: -The program stops on that line for a few minutes -During that time, my entire laptop suffers a lot of delay, even the mouse takes time to respond. -Finally the program ends and "terminated (killed)" appears. I think it is because G is too big for my laptop, since I only gave 30 GB to the partition where Ubuntu resides (Here where I run the program). I would like to know if the solution is simply to give more memory to my Ubuntu or my laptop is unable to run the program, I have a Geforce MX150 (Similar to a GeForces 1030), it is the only graphics card I have. I appreciate any help, I am new to this.

tranluan commented 4 years ago

You can try to use batch size of 1 or 2 then gradually increase to see how much it can fit to your gpu memory. Look likes your GPU has 2GB memory. I don't think it's enough to do meaningful training.

Adriana618-Love commented 4 years ago

Thanks for quickly answer. Yep, mi GPU has 4GB dedicated memory. I understand the suggestion, I will vary to see how far my laptop supports. I will shortly upload the results.

Adriana618-Love commented 4 years ago

All right, I've been varying the batch_size, as well as the sample_size. Everything was going relatively well with sizes until I set: -batch_size = 20 -sample_size = 20

At that time I get these errors: `2020-06-16 22:44:52.820004: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 128.00M (134217728 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY [] Reading checkpoints... [] Failed to find a checkpoint [!] Load failed... Epoch = 1 Number of batchs = 6122

Update G 2020-06-16 22:47:16.957862: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 256.00M (268435456 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY´

I think this confirms that I do not have a useful graphics card or that I am doing something wrong or maybe there is another alternative (I would not want to have to buy a graphics card right now)

cyjouc commented 3 years ago

All right, I've been varying the batch_size, as well as the sample_size. Everything was going relatively well with sizes until I set: -batch_size = 20 -sample_size = 20

At that time I get these errors: `2020-06-16 22:44:52.820004: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 128.00M (134217728 bytes) from device: CUDA_ERROR_OUT_OFMEMORY [] Reading checkpoints... [_] Failed to find a checkpoint [!] Load failed... Epoch = 1 Number of batchs = 6122

Update G 2020-06-16 22:47:16.957862: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 256.00M (268435456 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY´

I think this confirms that I do not have a useful graphics card or that I am doing something wrong or maybe there is another alternative (I would not want to have to buy a graphics card right now)

Hi，Do you solve the problem?would you share with me?