What GPU did you use? - Githubissues

victormashkov19 commented 5 years ago

I have tested code with NVIDIA TITAN RTX (24GB GPU RAM), but OOM occured. What GPU did you use to train network?

s60912frank commented 5 years ago

Same problem, though I'm using a single GTX1070(8GB VRAM). Had to make face alignment run on CPU, it's really slow but worked.

victormashkov19 commented 5 years ago

When I tested in CPU mode, peak memory usage exceeds 30GB.

s60912frank commented 5 years ago

Oops, sorry, I misread your question. I was taking about webcam demo.

vincent-thevenin commented 5 years ago

@victormashkov19 I used the Nvidia Tesla K80, I don't exceed 12GB of gpu memory with batch size of 2 but that's with the test dataset.

Did you use the full dataset? During what part of training did the OOM error occur?

victormashkov19 commented 5 years ago

I use full VoxCeleb2 dataset in the environment Windows 10 x64, Python 3.6(Anaconda3-5.2.0), Cuda 10.0. A strange thing is that when using small dataset (321 person, 55251 videos, 12.5GB), the saved model is 1443MB size, when using full dataset, saved model is greater 2554MB. OOM error message is as follows. Saving latest... ...Done saving latest Pytorch_VGGFACE_IR.py:117: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. prob = F.softmax(fc8_1) Traceback (most recent call last): File "train.py", line 133, in lossD.backward(retain_graph=False) File "C:\Anaconda3\lib\site-packages\torch\tensor.py", line 107, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "C:\Anaconda3\lib\site-packages\torch\autograd__init__.py", line 93, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: CUDA out of memory. Tried to allocate 2.08 GiB (GPU 0; 24.00 GiB total capacity; 16.06 GiB already allocated; 1.49 GiB free; 1.82 GiB cached)

At what environment did you tested?

rmd2 commented 5 years ago

Hi! I am facing the same problem... I am trying to continue the training process using a GTX1080 on Ubuntu 18.04, but I always get an Out Of Memory error when running ¨python train.py¨. Do you use any special step? Or I really need a gpu with more memory?

Thanks!

vincent-thevenin commented 5 years ago

Hi @victormashkov19 @rmd2, I made a recent commit in order to use less gpu memory during the gradient compute of both VGG networks. Can you pull these changes and tell me if there's any improvement?

Also I don't know if you did it but I changed the network I use in the README prerequisites. The new one improves memory usage as well please redo the prerequisites if you have not.

Using the full dataset compared to the test dataset will use more gpu memory as the Discriminator size depends on the dataset size.

If you need more gpu memory, you can always reduce the batch size during training. To do that modify line 63 in train.py: dataLoader = DataLoader(dataset, batch_size=2, shuffle=True), current setting is batch size of 2, change it to 1.

Please tell me if you still have memory errors.

danny-wu commented 4 years ago

Has anyone been able to get this to run on a GPU with 6GB VRAM?

vincent-thevenin / Realistic-Neural-Talking-Head-Models

What GPU did you use? #8