nnstreamer / nntrainer

NNtrainer is Software Framework for Training Neural Network Models on Devices.
Apache License 2.0
146 stars 73 forks source link

Support swish activation function on LSTM, Attention layers. #2118

Closed baek2sm closed 9 months ago

baek2sm commented 1 year ago

In the current PR #2110, a run_prime_fn function using 4 parameters (input, output, incoming derivative and outgoing derivative) was added for calculating derivative of the swish. This function works well in fc layers and conv layers, but some layers, such as LSTM and Attention Layer, are created on the premise that only three parameters are used to calculate derivative, so the added swish activation is not available in these layers.

Therefore, in order to support swish activation function in the LSTM and Attention layer, it is needed to modify the part that calls run_prime_fn in the LSTM and Attention layer to use 4 parameters. Currently, this modification is only related to whether the swish activation function is supported in LSTM and Attention layer, but this work is essential even in case more activation functions that require 4 parameters are added in the future.

taos-ci commented 1 year ago

:octocat: cibot: Thank you for posting issue #2118. The person in charge will reply soon.