Question regarding implementation of softmax in NLBlock

rononrun commented 4 years ago

Hi psychopa4, appreciate the great job you've done here!

I have a question with your implementation of the softmax in NLBlock. It appears that you have opted for manually computing softmax instead of using tf.nn.softmax (which you have commented out). Is there any reason for this?

if nltype<=1:
                # f_softmax = tf.nn.softmax(f, -1)
                f = tf.exp(f)
                f_softmax = f/tf.reduce_sum(f,axis=-1,keepdims=True)

In some cases during training, i have experienced f going to inf due to exponent of a pretty large number (something around exp(300+) would bring it to inf), while tf.nn.softmax will handle this properly.

Thanks!

psychopa4 commented 4 years ago

Using tf.nn.softmax directly may cause GPU out of memory when testing big-size input, and I found that computing softmax manually can solve this problem. You may use tf.nn.softmax directly if you didn't run into this problem.

rononrun commented 4 years ago

Hi psychopa4, I see. Ok got it! Thank you very much for the clarification sir.

psychopa4 / PFNL

Question regarding implementation of softmax in NLBlock #9