thieu1995 / mealpy

A Collection Of The State-of-the-art Metaheuristic Algorithms In Python (Metaheuristic/Optimizer/Nature-inspired/Biology)
https://mealpy.readthedocs.io
GNU General Public License v3.0
864 stars 175 forks source link

Question: How can I calculate the population's fitness globally instead of individually? #157

Closed xtagon closed 3 months ago

xtagon commented 3 months ago

Hi,

I would like to use mealpy to solve game problems, where each agent/solution in a population is a game-playing policy, and are evaluated by simulating multiplayer game instances. To determine fitness, all solutions in the generation would play a number of simulated games together (multiplayer) using a matchmaking or tournament process, and then only after those games are completed, the fitness of an individual would be their average win rate or another ranking function.

I am not sure how to set this up with mealpy because the obj_func function is applied once per solution, not once per population. Is there already a method of doing this?

Thanks!

thieu1995 commented 3 months ago

@xtagon,

Gaming problems often require dynamic environments, especially in fighting games. On the other hand, metaheuristic algorithms are designed for static environments. If you want to use algorithms to optimize for gaming problems, you can refer to reinforcement learning or deep reinforcement learning algorithms. These are optimal algorithms designed for dynamic environments.

If you still want to use metaheuristic algorithms for your problem, please describe to me how you encode a solution for your problem. What is your optimization problem like?

xtagon commented 3 months ago

The way I'm planning on encoding a solution is numeric values to use for tuning/optimizing a higher-level algorithm such as MCTS, CFR, MaxN. The tree search will then be used as an agent's policy when playing games, and the meta-heuristic algorithm will simply be finding more optimal hyperparameters for those algorithms, as well as for evaluation functions for when terminal states cannot be reached in the given amount of time.

Deep reinforcement learning and neural nets are not a good fit for my purpose because the NN cannot be easily analyzed and reasoned about once learned, compared to more symbolic algorithms with only a few "magic" parameters.

thieu1995 commented 3 months ago

So, this is a hyper-parameter optimization problem. You are using metaheuristics to optimize parameters for other algorithms such as MCTS, CFR, etc. Can you code a custom problem for your task? Why do you need to calculate fitness globally?

xtagon commented 3 months ago

I can code a custom problem for my task, however it is not very efficient because each simulated game involves 2 to 6 players per game, and multiple games are required to converge to an average win rate. With the custom class implementation, you are given a single solution and must return its fitness. So that's a lot of duplicate processing to run new games with new opponents for every solution in the population. In other words, each batch of simulated games provides the data required to evaluate the fitness for each agent who participated in that game.

thieu1995 commented 3 months ago

First, the custom Problem class does not duplicate anything. It does not hold any solutions. It is a class that represents your optimization problem. It holds the lower bounds and upper bounds for decision variables (in your case, the lower and upper bounds for hyper-parameters in algorithms like MCTS). Additionally, it can be used to calculate the fitness value for a solution that passes through the objective function. Since it defines the search space, when you have a solution in the search space, it should be able to determine the fitness value.

Second, metaheuristic algorithms do not need to understand the specifics of your problem. You simply need to provide an objective function that returns a fitness value, and your solution should be updated based on changes in this fitness value. I think there is a misunderstanding between the definition of a search agent (solution) in your gaming problem and a search agent (solution) in a metaheuristic context. Both are based on swarm intelligence (population-based), but they serve different purposes and tasks. A solution in metaheuristics is a complete solution for an optimization problem; it is not a player, a chess state, or a chess move (for example). As I mentioned earlier, metaheuristics are designed for static environments, not dynamic ones. In other words, they are perturbative search algorithms, not constructive search algorithms. You might want to consider other search methods like A*, DFS, or BFS for your problem.

xtagon commented 3 months ago

I realize it's an unusual choice for game solving, but it works. It's just that it would be more efficient to be able to update the objective function in batches, because the self-play evaluations involve multiple individuals in the population. It sounds like mealpy doesn't have a way to do something like self-play across multiple solutions in the population before evaluating the objective function. I was hoping it did.

Here is a (simplified) example:

The current method:

  1. Generate random solutions
  2. For each solution, evaluate its fitness (in my case, this spawns self-play in games, and averages the win rate for the primary player, and the other players win rates do not become any sort of tracked fitness, because the Problem class only evaluates one solution in the population at a time)
  3. Evolve, and repeat

The method I was hoping for:

  1. Generate random solutions
  2. For all solutions in the current generation's population, call a custom function which in my case would simulate self-play for ALL solutions, using matchmaking and tournaments, as many solutions would be in the same games with others
  3. Aggregate the results of that into rankings, and use the rankings as the individual objective functions (that is, higher ranking agents in self-play would correspond to higher fitness on their respective metaheuristic solutions)
  4. Evolve, and repeat

In the first method, the objective function is basically the same, but more computation has to be done. The objective function in this case is just maximizing the ranking in tournaments or win rates in multiplayer games, but the API for defining an objective function assumes only one is being tested at a time.

I think I can try to fork the code to accomplish what I need. Thank you for discussing with me

xinlnix commented 2 months ago

Dear @xtagon, I have the same demand as you. Have you already solve it or have some ideas? Thanks for your share

xtagon commented 1 month ago

@xinlnix I ended up not using MEALPY for my project but it can probably be done if you fork it. Most other libraries also make the same assumption about the fitness function unfortunately.

xinlnix commented 1 month ago

@xinlnix I ended up not using MEALPY for my project but it can probably be done if you fork it. Most other libraries also make the same assumption about the fitness function unfortunately. Dear xtagon, thanks for your reply. I found the get fitness process is done in the evole process in every method. So it may works by change the for loop in which algorithm is not determined by previous individule's get fitness process. If I have any process, I will report to you. BTW, can you tell me how you solve you problem?