Forward pass for the conv2d layer

This PR implements:

Forward pass for the conv2d layer
layer % get_output method for the conv2d and input3d layers
randn for 4-d data (used to initialize the convolution kernel)
A few tests for the input3d layer
Specific network % forward_3d method under the generic network % forward name

Some comments:

In the current forward pass implementation, the following dimension ordering seems to be most efficient regarding the memory layout, e.g. for the convolution kernel: filters x channels x width x height. This is consistent with the channels-last convention used by TensorFlow: In the input data, channels (e.g. red, green, blue) vary the fastest, then image width, then image height.
It looks like the convolution loop will provide some opportunity to parallelize (SPMD style)
associate appears buggy with ifort-2021.5; avoid it (the associate construct; not the compiler) for the time being, and revisit and file bug reports once we have a working implementation (i.e. when #64 is closed).

TODO

[x] Make conv2d API consistent with the Keras API (i.e. conv2d(filters, kernel_size)).

Closes #60.

CC @katherbreen

modern-fortran / neural-fortran