imankgoyal / NonDeepNetworks

Official Code for "Non-deep Networks"
BSD 3-Clause "New" or "Revised" License
586 stars 42 forks source link

How's speed comparasion on batch size 64 or input resolution up to 800w input? #1

Open lucasjinreal opened 3 years ago

lucasjinreal commented 3 years ago

How's speed comparasion on batch size 64 or input resolution up to 800w input?

AlexeyAB commented 3 years ago
lucasjinreal commented 3 years ago

@AlexeyAB thanks for your reply.

AlexeyAB commented 3 years ago

@jinfagang Thanks for good questions!

Currently Deep networks with higher batch size and resolution have higher FPS than Non-deep networks, but higher batch size doesn't reduce Latency. But real-time systems like self-driving cars and robots require exactly low Latency, not high FPS.

For both questions - yes, parallel strength will quickly have an upper bounds, but the newer the GPU, the more cores, the higher this boundary will shift, and Deep networks will need higher and higher batch-size and resolution to outperform Non-deep networks. So at some point in time for the future GPUs the resolution 3048x2333 in Non-deep will be faster than 2048x1980 in Deep. And the higher batch size you will need to use for Deep networks.

400 MB (Non-deep) vs 100 MB (Deep) for params isn't a big issue, since this difference 300 MB is ~10% for Jetson Nanon 4GB and ~1% for RTX 3090 24GB, this doesn't even allow us to make the batch size 2x times larger for Deep networks than for Non-deep ones. And the higher batch size, the more GPU memory we need to use for layer outputs, but the same amount of memory we need to use for params.

lucasjinreal commented 3 years ago

@AlexeyAB thank u, wider models might a very good direction to explore. I still wonder why did say this:

require exactly low Latency, not high FPS

Isn't that low latency (not params, not flops, not macs) is the final way to compare model speed? I thought it was same with FPS, this is good to compare 2 models in same device, since some models has fewer flops but they actually slower (might not optimized, or maybe not suitable for parallel, or maybe need more macs), so that final run time is the golden rule to judege a model speed (on same device). Isn't just one thing betwen low latency and fps?

AlexeyAB commented 3 years ago

@jinfagang I think that Latency_batch1 (and the same FPS_batch1) is the most important metric for comparing the speed of neural networks, because Latency = 1000ms / FPS is true only for batch=1.

For example, YOLOv5 shows latency 20ms for batch=32, while actually latency is 20ms * 32 samples = 640ms plus ~1000ms to get 32 frames from a video-camera in real projects. They just process batch=32 for 640ms then divide this time by 32 and show 20ms while this is not true.

For comparing models on current devices, we have metrics (but we need to test model on at least 3 devices: mobile-GPU/NPU, embedded-GPU, high-end-GPU):

There are some metrics which important

For the end user, only Accuracy/FPS or Accuracy/Latency_batch1 is important, and if the model has 2 times less flops, but at the same time 2x times slower, then user will not use it.