Addition of the MSE loss function

jvdp1 commented 4 months ago

Here is a PR to support MSE as loss function. Additional commits should provide options to the users for choosing among different loss functions (similar to the optimizers).

jvdp1 commented 4 months ago

@milancurcic would such an implementation be useful/appropriate?

milancurcic commented 4 months ago

I think it's a good approach (simple).

Do you foresee us needing to also carry the loss function itself (e.g. mse) in addition to the derivative one? Only the derivative is used in training, but I can imagine wanting to carry the loss function itself, for example for evaluating the loss of the network on the fly during training.

If we want to carry both functions (like the activation functions do, i.e. the function itself and its derivative) then I think the abstract derived type approach would be more appropriate. If you agree, we could just model this after the activation or optimizer module e.g.

type, abstract :: loss_type
contains
  procedure(loss_interface), deferred :: eval, derivative
end type loss_type

abstract interface
  pure function loss_interface(self, true, predicted) result(res)
    class(loss), intent(in) :: self
    real, intent(in) :: true(:)
    real, intent(in) :: predicted(:)
  end function eval
end interface

type, extends(loss_type) :: mse
contains
  procedure :: eval => eval_mse
  procedure :: derivative => derivative_mse
end type mse

contains

  pure function eval_mse(self, true, predicted) result(res)
    class(mse), intent(in) :: self
    real, intent(in) :: true(:), predicted(:)
    ...
  end function eval_mse

  pure function derivative_mse(self, true, predicted) result(res)
    class(mse), intent(in) :: self
    real, intent(in) :: true(:), predicted(:)
    ...
  end function derivative_mse

  ...

end module nf_loss

Then in the network type, the component for the loss would be:


type network
  ...
  class(loss_type), allocatable :: loss
  ...
end type network

and we'd call the respective functions with net % loss % eval and net % loss % derivative.

This way the pattern is also consistent with the optimizers and activations modules.

Let me know what you think.

jvdp1 commented 4 months ago

Do you foresee us needing to also carry the loss function itself (e.g. mse) in addition to the derivative one? Only the derivative is used in training, but I can imagine wanting to carry the loss function itself, for example for evaluating the loss of the network on the fly during training. If we want to carry both functions (like the activation functions do, i.e. the function itself and its derivative) then I think the abstract derived type approach would be more appropriate.

I think it would be good to provide a procedure evaluate like for Keras model.evaluate. In this case, a DT is indeed more appropriate.

If you agree, we could just model this after the activation or optimizer module e.g.

I actually started to implement it like the optimizer DT, but then I noticed that only the derivative was used, and switched to the current proposition. It should be easy to change it back to a DT.

and we'd call the respective functions with net % loss % eval and net % loss % derivative.

I think that the eval and derivative should be associated with different interfaces, because eval should return a scalar (e.g. MSE) while derivative should return a vector (e.g., dMSE/dx).

This way the pattern is also consistent with the optimizers and activations modules.

This makes sense, and it is also easier to follow the code (as the same approach is used for all components).

Should I close this PR, and open a new PR with a DT approach? Or just modify this PR?

milancurcic commented 4 months ago

You're correct regarding the different interfaces (scalar and vector) between eval and derivative!

Regarding the PR, whatever is easier is fine, you can keep this PR if it's convenient for you.

modern-fortran / neural-fortran

Addition of the MSE loss function #173