add scaling task - Githubissues

Delaunay commented 5 years ago

Replace multi GPU benchmark with a Scaling benchmark.

Since most multiGPU tasks use data parallel, the speed should scale linearly with the number of GPUs.

So we measure the efficiency of the scaling to evaluate multi GPU settings. This test is made to make sure DataParallel scale linearly across GPUs.

GPUs	RTX fp32	Speed up	Efficiency
1	183.28	1.00	100.00%
2	357.99	1.95	97.66%
3	507.75	2.77	92.35%
4	678.30	3.70	92.52%
5	849.75	4.64	92.73%
6	1014.84	5.54	92.29%
7	1187.39	6.48	92.55%
8	1351.39	7.37	92.17%

Reported Number avg: 94.03% and sd: 2%

Closer to 100% is better.

Efficiency should be > 90% to pass regardless of hardware vendors

Delaunay commented 5 years ago

python scaling.py --devices 0 1 2 2 3 micro_bench.py --network resnet50 --fp16 1

pending testing on a server

Delaunay commented 5 years ago

benchmark that can be removed after merging:

Delaunay commented 4 years ago

./image_classification/scaling/pytorch/run.sh --repeat 10 --number 5 --network resnet18 --batch-size 32

should work now

breuleux commented 4 years ago

I have merged this manually along with other changes. Thanks!

mila-iqia / training