nodchip / Stockfish

UCI chess engine
http://www.stockfishchess.com/
GNU General Public License v3.0
100 stars 25 forks source link

Training Formula #71

Closed jjoshua2 closed 4 years ago

jjoshua2 commented 4 years ago

return sigmoid(value / PawnValueEg / 4.0 * log(10.0));

This function does not seem to correspond to actual Stockfish games, since by my calculations this has 900cp like a 90% winrate when SF pretty much always wins once it hits +500, or else someone is filing a bug ticket.

Also it would be nice to have a way to use actual winrate in training data like from lc0 or once SF and similar NNUE engines have a winrate output option. It should be much more accurate to use a real winrate instead of guessing CP conversions and back again.

nodchip commented 4 years ago

Thank you for suggesting.

About the first topic, I will add an option to specify the coefficient in the formula.

About the second topic, it if difficult to use the actual winrate. Because a formula needs to be differentiable to calculate the gradient. But the formula to calculate the actual winrate is not differentiable if I understand correctly. I will check the formula again.

jjoshua2 commented 4 years ago

I was under the impression we could make a deep_winning_percentage function that basically just returned the winrate from training data, and leave the winning_percentage function alone for the shallow computations.

jjoshua2 commented 4 years ago

So for now if I have winrate data I need to convert it back into cp with the inverse of this function? Which from using wolfram alpha seems to be cp = -357.859 log(-1 + 1/ winrate)

nodchip commented 4 years ago

We don't have to convert a winrate data back into cp. But we need to develop the expression for gradient. We could develop the expression by deriving the loss function. And the definition of the loss function is written in the comment (https://github.com/official-stockfish/Stockfish/issues/2823#issuecomment-665613727)

How can we develop the expression for gradient? I'm thinking of it.

nodchip commented 4 years ago

As talked in Discord, we don't have to think about the differential of deep winning percentage. Because it does not exist in the differential of the loss function. We could implement the formula to calculate the internal winrate from various winrate scale. We also could add options to specify the min value and the max value of a winrate.

We will also add options to specify the min value and the max value to scale a teacher score (or winrate) in convert_bin command.

sf-x commented 4 years ago

return sigmoid(value / PawnValueEg / 4.0 * log(10.0));

This function does not seem to correspond to actual Stockfish games, since by my calculations this has 900cp like a 90% winrate when SF pretty much always wins once it hits +500, or else someone is filing a bug ticket.

"SF winrate output option" is nothing but a "CP conversion".

nodchip commented 4 years ago

This was released as https://github.com/nodchip/Stockfish/releases/tag/stockfish-nnue-2020-08-30. I will close this issue.