This PR contains the new formulation by Lucas Alric for the mixture-based activation function, i.e. mixture- ReLU, Sigmoid and Tanh. The new formulations are simpler and most importantly, they remove the need to have omega_tol to avoid having numerical issues with /0.
Changes Made
I have completed the implementation and tested the new formulation for all the mixture-based activation functions in all four files: activation.cpp, activation_fun_cpu.cpp, activation_fun.cu, activation_cuda.cu
I have removed the omega_tol parameter.
I have found and corrected a bug in the existing MixtureSigmoid() where the ma = ma/2 should have been done as a separate step. The result is that the number of epochs in the LSTM example test_lstm.py needs to be reduced to avoid NaNs.
I have re-updated the unit tests that were all minimally changed by the new formulation.
I have tested with test.py all activation functions with CPU and GPU.
Note for Reviewers
You can test the new activation functions through either the test.py or test_lstm.py
The mathematical formulation implemented as well as the comparison with MC sampling are presented in the following file:
mRELU_Goulet_2022.pdf
Description
This PR contains the new formulation by Lucas Alric for the mixture-based activation function, i.e. mixture- ReLU, Sigmoid and Tanh. The new formulations are simpler and most importantly, they remove the need to have
omega_tol
to avoid having numerical issues with /0.Changes Made
activation.cpp
,activation_fun_cpu.cpp
,activation_fun.cu
,activation_cuda.cu
omega_tol
parameter.MixtureSigmoid()
where thema = ma/2
should have been done as a separate step. The result is that the number of epochs in the LSTM exampletest_lstm.py
needs to be reduced to avoid NaNs.test.py
all activation functions with CPU and GPU.Note for Reviewers
You can test the new activation functions through either the
test.py
ortest_lstm.py
The mathematical formulation implemented as well as the comparison with MC sampling are presented in the following file: mRELU_Goulet_2022.pdf