Closed jvdp1 closed 4 months ago
@milancurcic would such an implementation be useful/appropriate?
I think it's a good approach (simple).
Do you foresee us needing to also carry the loss function itself (e.g. mse
) in addition to the derivative one? Only the derivative is used in training, but I can imagine wanting to carry the loss function itself, for example for evaluating the loss of the network on the fly during training.
If we want to carry both functions (like the activation functions do, i.e. the function itself and its derivative) then I think the abstract derived type approach would be more appropriate. If you agree, we could just model this after the activation or optimizer module e.g.
type, abstract :: loss_type
contains
procedure(loss_interface), deferred :: eval, derivative
end type loss_type
abstract interface
pure function loss_interface(self, true, predicted) result(res)
class(loss), intent(in) :: self
real, intent(in) :: true(:)
real, intent(in) :: predicted(:)
end function eval
end interface
type, extends(loss_type) :: mse
contains
procedure :: eval => eval_mse
procedure :: derivative => derivative_mse
end type mse
contains
pure function eval_mse(self, true, predicted) result(res)
class(mse), intent(in) :: self
real, intent(in) :: true(:), predicted(:)
...
end function eval_mse
pure function derivative_mse(self, true, predicted) result(res)
class(mse), intent(in) :: self
real, intent(in) :: true(:), predicted(:)
...
end function derivative_mse
...
end module nf_loss
Then in the network type, the component for the loss would be:
type network
...
class(loss_type), allocatable :: loss
...
end type network
and we'd call the respective functions with net % loss % eval
and net % loss % derivative
.
This way the pattern is also consistent with the optimizers and activations modules.
Let me know what you think.
Do you foresee us needing to also carry the loss function itself (e.g.
mse
) in addition to the derivative one? Only the derivative is used in training, but I can imagine wanting to carry the loss function itself, for example for evaluating the loss of the network on the fly during training. If we want to carry both functions (like the activation functions do, i.e. the function itself and its derivative) then I think the abstract derived type approach would be more appropriate.
I think it would be good to provide a procedure evaluate
like for Keras model.evaluate
. In this case, a DT is indeed more appropriate.
If you agree, we could just model this after the activation or optimizer module e.g.
I actually started to implement it like the optimizer DT, but then I noticed that only the derivative was used, and switched to the current proposition. It should be easy to change it back to a DT.
and we'd call the respective functions with
net % loss % eval
andnet % loss % derivative
.
I think that the eval
and derivative
should be associated with different interfaces, because eval
should return a scalar (e.g. MSE) while derivative
should return a vector (e.g., dMSE/dx).
This way the pattern is also consistent with the optimizers and activations modules.
This makes sense, and it is also easier to follow the code (as the same approach is used for all components).
Should I close this PR, and open a new PR with a DT approach? Or just modify this PR?
You're correct regarding the different interfaces (scalar and vector) between eval
and derivative
!
Regarding the PR, whatever is easier is fine, you can keep this PR if it's convenient for you.
Here is a PR to support MSE as loss function. Additional commits should provide options to the users for choosing among different loss functions (similar to the optimizers).