modern-fortran / neural-fortran

A parallel framework for deep learning
MIT License
398 stars 82 forks source link

Implement activation params type #124

Closed milancurcic closed 1 year ago

milancurcic commented 1 year ago

Some activation functions like leaky ReLU (#123) require one or more additional parameters.

To allow passing activation functions as procedure pointers, all functions must have the same interface. A proposed general solution (thank @jvdp1) is to:

  1. Define a derived type activation_params or similar that defines any possible extra parameters that may be needed by activation functions; set default values of activation parameters in the type definition.
    type :: activation_params
    real :: alpha = 0.3
    end type activation_params
  2. Make each activation function have a type(activation_params), intent(in) optional :: params dummy argument (instead of the current alpha). Inside activation function definitions, function that use one or more activation parameters access them directly; those that don't simply ignore it.
    pure function leaky_relu(x, params) result(res)
    !! Leaky Rectified Linear Unit (Leaky ReLU) activation function.
    real, intent(in) :: x(:)
    type(activation_params), intent(in), optional :: params
    real :: res(size(x))
    res = max(params % alpha * x, x)
    end function leaky_relu
  3. Make activation_params an attribute of dense and conv2d layers (and later any other layers that activate).
  4. Pass the attributes to the activation procedure associated with the layer.
ggoyman commented 1 year ago

Hi! What do you think about abstract class-based activation function implementation?

We can define abstract class containing only deferred function eval

type, abstract :: activation_function_t
contains
    procedure(eval_i), deferred :: eval 
end type activation_function_t

abstract interface
    pure function eval_i(this, x) result(res)
        import :: activation_function_t
        class(activation_function_t), intent(in) :: this
        real, intent(in) :: x(:)
        real :: res(size(x))
    end function eval_i
end interface

Then, by extending the `activation function_t' class, concrete activation functions can be defined, with function parameters simply being members of this new type:

 type, extends(activation_function_t) :: elu_function_t
     real :: alpha
 contains
     procedure :: eval => eval_elu
 end type elu_function_t

 contains

 pure function eval_elu(this, x) result(res)
    ! Exponential Linear Unit (ELU) activation function.
    class(elu_function_t), intent(in) :: this
    real, intent(in) :: x(:)
    real :: res(size(x))
    where (x >= 0)
      res = x
    elsewhere
      res = this%alpha * (exp(x) - 1)
    end where
 end function eval_elu

So we can initialize an instance of this type to pass it as an argument or to use it as a member of other types:

 class(activation_function_t), allocatable :: activation_function

 allocate( activation_function, source = elu_function_t( alpha = 0.3 ) )

Moreover, we can add eval_prime procedure to the activation_function_t, allowing to provide evaluation of function values and it derivatives using a single object.

milancurcic commented 1 year ago

Thanks @ggoyman, I believe inference-engine takes a similar approach.

In a nutshell, it seems to me that an abstract class approach allows the activation-specific parameters to be carried with the concrete activation type itself, rather than the layer type. I like that.

Would you be open to contributing this as a PR? I'd help.

rouson commented 1 year ago

Yes,that's the Inference-Engine approach. We call the abstract type activation_strategy_t because it's an example of the Strategy design pattern .

ggoyman commented 1 year ago

@milancurcic, OK, I'll try to implement this solution.

milancurcic commented 1 year ago

Solved by #126.