Question about batch size

zju3dv / SMAP

[ECCV 2020] SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation

Apache License 2.0

241 stars 37 forks source link

Question about batch size #36

Open xobeiotozi opened 3 years ago

xobeiotozi commented 3 years ago

I have read your paper carefully. It mentioned that the batch size is set to 32, but I only see solver. IMG_PER_GPU = 2 in train.py. Is this a change in the code for a GPU training? Thanks a lot for your time and reading

raypine commented 3 years ago

The batch size is calculated in the setting of multi-gpu DistributedDataParallel training.

xobeiotozi commented 3 years ago

I only have one GPU, can I only set IMG_PER_GPU = 1?

xobeiotozi commented 3 years ago

How do I solve this problem? Is it because I only have one GPU? raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python', '-u', 'train.py', '--local_rank=0']' died with <Signals.SIGKILL: 9>.

raypine commented 3 years ago

What is your setting of "nproc_per_node" ?

xobeiotozi commented 3 years ago

I’m setting nproc_per_node=1

raypine commented 3 years ago

It may be a problem. An easy solution is to call "train.py" directly rather than using "torch.distributed.launch".

xobeiotozi commented 3 years ago

It also have a problem.Is these something wrong with train.py?

2021-08-03 16:40:36 node02 root[2842] INFO using devices 0 train.sh: line 5: 2842 Killed python train.py