Open xt2357 opened 5 years ago
Subtracting max_y from all y makes the output of softmax harder to overflow.
Subtracting max_y from all y makes the output of softmax harder to overflow.