Open zvookin opened 2 months ago
Per discussion, add operator to implement efficient two pass softmax algorithm. This is a draft as it is hoped the numerics can be improved.
Can you clarify a bit? It's not obvious to me what the deficiencies are (just curious)
Per discussion, add operator to implement efficient two pass softmax algorithm. This is a draft as it is hoped the numerics can be improved.