libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.79k stars 180 forks source link

doubt about mutli-gpu train when use imagenet 4 gpus #356

Closed daixiangzi closed 5 months ago

daixiangzi commented 7 months ago

Why are 4 processes started on GPU card 0 instead of 1 process? 0 N/A N/A 2579940 C .../miniconda3/envs/ffcv/bin/python3.9 13724MiB | | 0 N/A N/A 2579941 C .../miniconda3/envs/ffcv/bin/python3.9 414MiB | | 0 N/A N/A 2579942 C .../miniconda3/envs/ffcv/bin/python3.9 414MiB | | 0 N/A N/A 2579943 C .../miniconda3/envs/ffcv/bin/python3.9 414MiB | | 1 N/A N/A 2579941 C .../miniconda3/envs/ffcv/bin/python3.9 13868MiB | | 2 N/A N/A 2579942 C .../miniconda3/envs/ffcv/bin/python3.9 13908MiB | | 3 N/A N/A 2579943 C .../miniconda3/envs/ffcv/bin/python3.9 13958MiB |

souravdalai18 commented 6 months ago

when you load pretrained checkpoints states , use map_location to 'cpu'