# out very own softmax
def output2probs(output):
output = np.dot(output, weights[0]) + weights[1]
output -= output.max()
output = np.exp(output)
output /= output.sum()
return output
I tried example
x = np.array([0.5,.3,.2])
x -= x.max() #array([ 0. , -0.2, -0.3])
x = np.exp(x) #array([ 1. , 0.81873075, 0.74081822])
x /= x.sum() #array([ 0.39069383, 0.31987306, 0.28943311])
It seems it smooths out probability big gaps? Why we want this? is it ahead we done sampling? why are we not simply taking top k most probable word provided by predict_proba function?
I tried example
It seems it smooths out probability big gaps? Why we want this? is it ahead we done sampling? why are we not simply taking top k most probable word provided by
predict_proba
function?