Tuning evaluation function

Andyloris commented 2 years ago

Hi, i recently coded a small chess engine which uses minimax and an evaluation funtion, and then i realised that i needed to tune the values of the evaluation function to win some elo, and that i could do this with the bot. Have you ever tried this ?

pmariglia commented 2 years ago

Great question.

I've explored evaluation function tuning a bit on a private branch, and these are my thoughts:

Tuning via self-play is extremely slow. A game without randomness like chess can need up to 100,000 self-play games of tuning for an improvement to be noted. When you account for the randomness of a game like Pokemon it's almost certainly going to be much more than that. Playing this many games, even super fast games with the bot against itself, is just not going to happen in any reasonable amount of time.
(This one isn't backed by any data but is more of an intuition I have and I very well may be wrong.) A static evaluation function just isn't enough to properly play competitive Pokemon. The value of something like a Pokemon's HP is dynamic and can change depending on the team matchup. In my opinion an evaluation function would need to have much more additional logic than the one in this project in order to properly capture what is important in a Pokemon battle

That being said I haven't spent too much time looking into this and I would certainly like to be wrong. If you have any ideas on how this can be approached I'd love to hear it.

What I've experimented with:

Texel's method might be feasible for this project.

Andyloris commented 2 years ago

If you're using texel's tuning you would neeed to somehow store in a text file all details about a "position" so you would need to store the pokemons of the two sides, their attacks, pps, items, evs, ivs, stats, the information about the terrain, which could be feasible, but would take a lot of space if you wanted to generate a LOT of positions, and it would take a lot of time so i don't think it's feasible.

I've tried stockfish's tuning method but so far it's a complete failure.

I think using something like CLOP could work beacause it doesn't need to generate a lot of positions and actually works in other games where there is randomness (backgammon) . Did you try CLOP ?

pmariglia commented 2 years ago

If you're using texel's tuning you would neeed to somehow store in a text file all details about a "position" so you would need to store the pokemons of the two sides, their attacks, pps, items, evs, ivs, stats, the information about the terrain, which could be feasible, but would take a lot of space if you wanted to generate a LOT of positions, and it would take a lot of time so i don't think it's feasible.

Yup, that is something I was going to try. Running all of those battles would probably take several weeks just based on some rough math about how long battles can take. Then minimizing the error would probably take even longer.

I've tried stockfish's tuning method but so far it's a complete failure.

My experience was identical :) Too slow, and the variables just took a random walk.

I think using something like CLOP could work beacause it doesn't need to generate a lot of positions and actually works in other games where there is randomness (backgammon) . Did you try CLOP ?

Haven't tried CLOP, mostly just because I am not too familiar with the algorithm itself. I'd be open to exploring the idea with you (just send me a message on Discord - my handle is in the README), but my gut tells me that any parameter optimization technique for Pokemon just isn't going to have good results, especially ones that were developed for Chess. Again very happy to be wrong.

Andyloris commented 2 years ago

I'm not too familliar with CLOP but it seems really easy to use. I think i'm gonna try to use it with chess before using in pokemon. I am using this program to tune my engine.

Edit: I'm way to impatient to start with chess, i'm gonna try to start with pokemon

pmariglia / showdown

Tuning evaluation function #89