This repository is an official PyTorch implementation of the paper "Learnable Triangulation of Human Pose" (ICCV 2019, oral). Proposed method archives state-of-the-art results in multi-view 3D human pose estimation!
MIT License
1.09k
stars
181
forks
source link
When training, CUDA runs out of memory - How can I reduce the batch size? #89
args: Namespace(config='experiments/human36m/train/human36m_vol_softmax.yaml', eval=False, eval_dataset='val', local_rank=None, logdir='./logs', seed=42)
Number of available GPUs: 1
Loading pretrained weights from: ./data/pretrained/human36m/pose_resnet_4.5_pixels_human36m.pth
Reiniting final layer filters: module.final_layer.weight
Reiniting final layer biases: module.final_layer.bias
Successfully loaded pretrained weights for backbone
Loading data...
Experiment name: human36m_vol_softmax_VolumetricTriangulationNet@25.06.2020-17:58:32
Traceback (most recent call last):
File "train.py", line 483, in
main(args)
File "train.py", line 462, in main
n_iters_total_train = one_epoch(model, criterion, opt, config, train_dataloader, device, epoch, n_iters_total=n_iters_total_train, is_train=True, master=master, experiment_dir=experiment_dir, writer=writer)
File "train.py", line 191, in one_epoch
keypoints_3d_pred, heatmaps_pred, volumes_pred, confidences_pred, cuboids_pred, coord_volumes_pred, base_points_pred = model(images_batch, proj_matricies_batch, batch)
File "/home/jamal/anaconda3/envs/learnable_triangulation_1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, kwargs)
File "/media/jamal/jknight3TB/projects/learnable-triangulation-pytorch/mvn/models/triangulation.py", line 253, in forward
heatmaps, features, , vol_confidences = self.backbone(images)
File "/home/jamal/anaconda3/envs/learnable_triangulation_1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, *kwargs)
File "/media/jamal/jknight_3TB/projects/learnable-triangulation-pytorch/mvn/models/pose_resnet.py", line 301, in forward
x = self.layer3(x)
File "/home/jamal/anaconda3/envs/learnable_triangulation_1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(input, kwargs)
File "/home/jamal/anaconda3/envs/learnable_triangulation_1/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/jamal/anaconda3/envs/learnable_triangulation_1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, *kwargs)
File "/media/jamal/jknight_3TB/projects/learnable-triangulation-pytorch/mvn/models/pose_resnet.py", line 79, in forward
out = self.bn1(out)
File "/home/jamal/anaconda3/envs/learnable_triangulation_1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(input, **kwargs)
File "/home/jamal/anaconda3/envs/learnable_triangulation_1/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 76, in forward
exponential_average_factor, self.eps)
File "/home/jamal/anaconda3/envs/learnable_triangulation_1/lib/python3.6/site-packages/torch/nn/functional.py", line 1623, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 11.25 MiB (GPU 0; 7.92 GiB total capacity; 6.35 GiB already allocated; 8.56 MiB free; 444.50 KiB cached)
When I ran the train command I got an error that CUDA is out of memory. Could this be a batch size issue?
Is this where I can change the batch size?
File: human36m_vol_softmax.yaml
Lines 17+ 18: batch_size: 5 val_batch_size: 10
What would be a good batch size to try?
Command:
python3 train.py --config experiments/human36m/train/human36m_vol_softmax.yaml --logdir ./logs
Error:
args: Namespace(config='experiments/human36m/train/human36m_vol_softmax.yaml', eval=False, eval_dataset='val', local_rank=None, logdir='./logs', seed=42) Number of available GPUs: 1 Loading pretrained weights from: ./data/pretrained/human36m/pose_resnet_4.5_pixels_human36m.pth Reiniting final layer filters: module.final_layer.weight Reiniting final layer biases: module.final_layer.bias Successfully loaded pretrained weights for backbone Loading data... Experiment name: human36m_vol_softmax_VolumetricTriangulationNet@25.06.2020-17:58:32 Traceback (most recent call last): File "train.py", line 483, in
main(args)
File "train.py", line 462, in main
n_iters_total_train = one_epoch(model, criterion, opt, config, train_dataloader, device, epoch, n_iters_total=n_iters_total_train, is_train=True, master=master, experiment_dir=experiment_dir, writer=writer)
File "train.py", line 191, in one_epoch
keypoints_3d_pred, heatmaps_pred, volumes_pred, confidences_pred, cuboids_pred, coord_volumes_pred, base_points_pred = model(images_batch, proj_matricies_batch, batch)
File "/home/jamal/anaconda3/envs/learnable_triangulation_1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, kwargs)
File "/media/jamal/jknight3TB/projects/learnable-triangulation-pytorch/mvn/models/triangulation.py", line 253, in forward
heatmaps, features, , vol_confidences = self.backbone(images)
File "/home/jamal/anaconda3/envs/learnable_triangulation_1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, *kwargs)
File "/media/jamal/jknight_3TB/projects/learnable-triangulation-pytorch/mvn/models/pose_resnet.py", line 301, in forward
x = self.layer3(x)
File "/home/jamal/anaconda3/envs/learnable_triangulation_1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(input, kwargs)
File "/home/jamal/anaconda3/envs/learnable_triangulation_1/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/jamal/anaconda3/envs/learnable_triangulation_1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, *kwargs)
File "/media/jamal/jknight_3TB/projects/learnable-triangulation-pytorch/mvn/models/pose_resnet.py", line 79, in forward
out = self.bn1(out)
File "/home/jamal/anaconda3/envs/learnable_triangulation_1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(input, **kwargs)
File "/home/jamal/anaconda3/envs/learnable_triangulation_1/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 76, in forward
exponential_average_factor, self.eps)
File "/home/jamal/anaconda3/envs/learnable_triangulation_1/lib/python3.6/site-packages/torch/nn/functional.py", line 1623, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 11.25 MiB (GPU 0; 7.92 GiB total capacity; 6.35 GiB already allocated; 8.56 MiB free; 444.50 KiB cached)