Obviously, the results only count the macs of convolution layer by 10x28x28x5x5x1+10x28x28x3x3x10, but ignores the macs of batch normalization layer and shortcut addition.
Batch normalization can be merged to convolution during the inference time, which is the common practice. I don't take the addition into account as it should not be considered as MACs.
I test a simple model as following:
And the results are
Obviously, the results only count the macs of convolution layer by 10x28x28x5x5x1+10x28x28x3x3x10, but ignores the macs of batch normalization layer and shortcut addition.