Make the NN forward procedures in Python and Fortran identical

tztsai commented 5 months ago

Currently the feed forward functions in Python https://github.com/m2lines/convection-parameterization-in-CAM/blob/a575e39b16d657dea181785e2e2df53af2689b23/torch_nets/models.py#L54 and in Fortran https://github.com/m2lines/convection-parameterization-in-CAM/blob/a575e39b16d657dea181785e2e2df53af2689b23/NN_module/nn_cf_net.f90#L107 are different. Several steps like normalisation and scaling need to be added to the python forward method.

jatkinson1000 commented 5 months ago

This is definitely a good thing to do to align implementations.

In the Fortran the inputs are normalised with:

features = (features - xscale_mean) / xscale_stnd

as above, whilst the outputs are un-normalised with:

out_pos = 0
do f = 1, n_features_out
   feature_size = feature_out_sizes(f)
   logits(out_pos+1:out_pos+feature_size) = (logits(out_pos+1:out_pos+feature_size)*yscale_stnd(f)) + yscale_mean(f)
   out_pos = out_pos + feature_size
end do

The normalisation data can be found in the netCDF file as:

fscale_mean and fscale_stnd for the inputs (length 61)
oscale_mean and oscale_stnd for the output data (length 5)

Note that whilst the scaling is applied directly to the input vector this is not the case with the outputs.\ Each of the 5 outputs (concatenated into a single output vector) is normalised with a single mean and std. deviation.

The data can be obtained by following a similar approach to that already in endow_with_netcdf_params in models.py

jatkinson1000 commented 5 months ago

The individual output vectors are each of lengths: 30, 29, 29, 30 30, concatenated for a total length of 148.

As discussed with @tztsai the normalisation of inputs can be coded as a process before the layers are called in a similar way to Fortran. For the outputs we can represent the de-normalisation process as a Linear layer.

m2lines / convection-parameterization-in-CAM

Make the NN forward procedures in Python and Fortran identical #50