Training time differences

@Asifzm I'm moving this to discussions because it's not a bug

There have been quite a few changes to effdet and pytorch in that timespan. I've noticed a number of hardware and version specific performance regressions in PyTorch, esp 1.7. I'd try it in different releases, the cuda 10.2 variants of 1.7 will be closer to 1.4, the 11.x may have some specific performance issues with your exact card. You can also try 1.8 and NGC containers, I usually train on NGC containers... 20.12 and 21.02 both seem pretty good.

In terms of this codebase, I made a number of changes in the summer that impact performance, some gained speed, but others lost some speed in exchange for better loss stability and results.

You can try experimenting with --jit-loss to enable jit scripting the loss fn for some speed gain, but it can blow up memory usage on the GPU.

You can also revert back to older loss fn with --legacy-focal, it has different throughput/memory behaviour, usually a bit faster, but it's a bit less numerically stable than the current one

And finally, you can try --torchscript to train with the whole model + bench torchscripted, I often find this improves the overall throughput.

The SiLU activation change should be an overall performance gain for PyTorch 1.7/1.8.

rwightman / efficientdet-pytorch

Training time differences #193