Potential minor speedup: Gaussian blur is a separable 2d convolution

https://github.com/libffcv/ffcv-imagenet/blob/e97289fdacb4b049de8dfefefb250cc35abb6550/train_imagenet.py#L124

Not to be nitpicky, but this could actually be replaced with two "1d" convolutions, one for width and one for height, which would use ~2K operations instead of ~K^2:

def separable_conv2d(inputs: Tensor, k_h: Tensor, k_w: Tensor) -> Tensor:
    kernel_size = max(k_h.shape[-2:])
    pad_amount = kernel_size // 2 #'same' padding.
    # Gaussian filter is separable:
    out_1 = F.conv2d(inputs, k_h, padding=(0, pad_amount))
    out_2 = F.conv2d(out_1, k_w, padding=(pad_amount, 0))
    return out_2

libffcv / ffcv-imagenet

Potential minor speedup: Gaussian blur is a separable 2d convolution #16