I started implementing the new formulation for the mixture-based activation functions and I realized that there an issue with the MixtureRelu, MixtureTanh, MixtureSigmoid while running pytagi_v1 on CUDA.
If I run the test.py example on CPU, all the activation function work properly. When running with CUDA the MixtureRelu, MixtureTanh and MixtureSigmoid do not learn. If I run the same test on CUDA with the standard ReLU it works properly.
If I run the c++ example cfg_mnist_2fc.txt, all mixture-based activations work well on CUDA.
@lhnguyen102 Any insights about what could cause this issue?
I started implementing the new formulation for the mixture-based activation functions and I realized that there an issue with the
MixtureRelu
,MixtureTanh
,MixtureSigmoid
while running pytagi_v1 on CUDA.If I run the
test.py
example on CPU, all the activation function work properly. When running with CUDA the MixtureRelu, MixtureTanh and MixtureSigmoid do not learn. If I run the same test on CUDA with the standard ReLU it works properly.If I run the c++ example
cfg_mnist_2fc.txt
, all mixture-based activations work well on CUDA.@lhnguyen102 Any insights about what could cause this issue?