magefree / mage

Magic Another Game Engine
http://xmage.today
MIT License
1.84k stars 760 forks source link

Documentation about developing an AI player? #10154

Open gcoter opened 1 year ago

gcoter commented 1 year ago

Hi there, I would like to develop a new AI player. Is there any documentation about this? I looked for it on the Wiki but could not find anything. Thanks in advance!

JayDi85 commented 1 year ago

Unfortunately, there is no detailed documentation on how the AI works. All you can do is take the current implementation and try to modify it.

XMage is server side app with all game and rules logic on server side only. So there are two possible points to inject AI:

What can you do:

The faster way to start with your own AI code:

gcoter commented 1 year ago

Hi @JayDi85, thank you so much for this detailed answer! I will give it some thoughts and take a look at ComputerPlayer7 as it is apparently the current reference for AI implementation :slightly_smiling_face:

gcoter commented 1 year ago

Why is it called ComputerPlayer7? Is it like the 7th version? What about the other classes? Are they deprecated?

JayDi85 commented 1 year ago

Why is it called ComputerPlayer7? Is it like the 7th version? What about the other classes? Are they deprecated?

Yes, it's like 7th version, but some older version were removed from the code.

XMage uses Player interface to work with game's players. So:

Non AI players:

Base AI players:

Inner AI players:

Tests AI players:

Real AI players:

gcoter commented 1 year ago

Hi @JayDi85, thank you very much for the information, I think I understand now :slightly_smiling_face:

gcoter commented 1 year ago

Ideally, I would like to take a base class which implements the basic logic (which is necessary for the AI to work with the game) and override a subset of methods which are purely related to strategy. However, I struggle to identify this subset.

For instance, I understand that chooseMulligan can be used to change the strategy to keep or not the starting hand cards. choose on the other hand looks more generic and I struggle to understand its meaning.

In your opinion, what are the methods which are really related to strategy (i.e. their implementation offers degrees of freedom to change the behavior of the AI)? Which ones on the contrary should not be overridden (i.e. they are necessary for the class to work with the game)?

Is there some documentation about the different method like choose for instance (what is it supposed to do? what is it supposed to return?)?

Sorry I ask many questions, I find the project very interesting but I need some help to get started :sweat_smile:

JayDi85 commented 1 year ago

There are two different and important block of code/logic for AI:

I don't know what you want to do with AI. My suggestions:

JayDi85 commented 1 year ago

All choose methods must setup targets in the Target param (all game objects has unique identifier e.g. UUID id, so it used as a targets/choices). Game engine check it and don't allow to setup an incorrect data. But there are possible bugs or rare use cases, so AI able to "hack" it and set data like "one creature attacks multiple targets", see example: #7434

JayDi85 commented 1 year ago

You can look at TestPlayer.java and the most important methods for the game (search by @Override). There are few basic methods. You can use same approach in the new AI -- if something goes wrong then skip and call super method:

shot_230403_014306

gcoter commented 1 year ago

Ok, thank you very much :slightly_smiling_face:

Here is some context about what I would like to do. I am wondering whether we could introduce Machine Learning to develop more powerful AIs for MTG. I work in this field and I am aware of several successes like AlphaGo (and its more advanced versions), AlphaStar, and some others which basically use Reinforcement Learning to improve upon classical MCTS. MTG is a quite challenging game but I think that, at least in theory, we could use similar technics.

In order to do that, I need to define 3 things:

  1. The state of the game (i.e. all the information that the AI knows at a given stage in the game)
  2. The action space (i.e. all the possible actions at a given stage in the game)
  3. A reward system (i.e. a measure of how well the AI is doing, it could simply be something like "-1" if it lost the game and "1" if it won or something more complex)

Going straight for a complete RL-based AI is very ambitious, a more realistic strategy is to take one existing AI implementation and define sub-problems to solve. For instance, it could be to learn a score for the pickCard method as you proposed. I think I would like to start doing something "simple" like this.

I also have one technical requirement: I must be able to run thousands of simulated games in order to collect data and train the AI. Most of the state-of-the-art RL algorithms are based on self-play (the AI plays against itself and learn from it). Do you think it would be possible to do something like this? For instance, would it be possible to write a code which simulates thousands of games of ComputerPlayer7 playing against itself and record all the states and actions in a file?

If it was possible, then I could use this data for training and I could create a subclass of ComputerPlayer7 which calls the trained model in pickCard instead of using the scores of RateCard.java for instance.

And then, if all of this is a success, we could extend this to other sub-problems, creating additional models :slightly_smiling_face:

JayDi85 commented 1 year ago
  1. The state of the game (i.e. all the information that the AI knows at a given stage in the game)

Already has it:

  1. The action space (i.e. all the possible actions at a given stage in the game)

Already has it:

  1. A reward system (i.e. a measure of how well the AI is doing, it could simply be something like "-1" if it lost the game and "1" if it won or something more complex)

Already has it:

I must be able to run thousands of simulated games in order to collect data and train the AI. Most of the state-of-the-art RL algorithms are based on self-play (the AI plays against itself and learn from it)

Already has it. See LoadTest for examples (e.g. test_TwoAIPlayGame_Multiple):

Do you think it would be possible to do something like this? For instance, would it be possible to write a code which simulates thousands of games of ComputerPlayer7 playing against itself and record all the states and actions in a file

You can inject your code to collect that data, see client side example with GameView and appendJsonEvent. Same for server side AI (but another places to inject like boolean priorityPlay()

There is a very important thing about GameState or GameView objects -- you can't apply some commands and transform one version of GameState to another. Game engine can makes multiple changes to the GameState in one game cycle. See discussion here #9619 One average game can generates 30-80 MB of GameView's data (99% of data are not changed between game cycles, so it can be optimized someday by diff logic instead full).

XMage uses thousands of unit tests to run and test real games. So if you need a server side AI and data then it can be simulated without real server. If you don't need client side AI/data then I recommends to run your games by test framework instead real server.

gcoter commented 1 year ago

Indeed, everything seems to be already there, thanks again :slightly_smiling_face: I guess I will start by running test_TwoAIPlayGame_Multiple to see how it works and then adapt it so collect data.

I tried running test_TwoAIPlayGame_One, I get an error:

ERROR 2023-04-04 21:33:18,554 monitor_6139: Unknown callback: SHOW_USERMESSAGE, [Join Table, You (ai_1) have an invalid deck for the selected Freeform Commander Format. 

Commander: Sideboard must contain only the commander(s) and up to 1 companion
Deck: Must contain 100 cards: has 42 cards
Farewell: Too many: 2
Kairi, the Swirling Sky: Too many: 2
Lion Sash: Too many: 2
Satoru Umezawa: Too many: 2

Select a deck that is appropriate for the selected format and try again!] =>[ThreadPool(1)-1] LoadCallbackClient.processCallback 

I understand that it has a problem with the randomly created deck. How can I solve this? I can create a separate issue if you prefer.

JayDi85 commented 1 year ago

LoadTest improved in a9f1e15168a575d641cecdcdbf15f9c3224e5c9f and 81d9c099fb8ec071d09c00e76b424dfab0bdd6f9. Now it works fine.

Commander: Sideboard must contain only the commander(s) and up to 1 companion

Yes, it's a wrong deck generation. It must be rare/impossible error in auto-generated decks. I recommends to use simple deck colors like "RG" (red and green colors are good for ai decks, cause blue/black cards can be too complicated for use).

Look at the start of the file, it have some default settings:

BTW there are two different load tests:

jeffwadsworth commented 2 months ago

Just placing this combat combination code here for anyone that wishes to use it. I have used this for a few years on my local copy and it works fine...but going anywhere over 4 attackers on 4 blockers (1, 717 combinations) will have to wait for quantum computers to be realized in 1000 years if ever. It would be beautiful if the code supported multiple cores/threaded as that would speed up the process. As an aside, something like Claude 3.5 can evaluate the game state pretty well without all this work, though it has insane compute the work with.

This would be called via something like: List<List> combinations = CombinationGenerator.generateAllCombinations(attackers, blockers);

`public static class CombinationGenerator { private final Map<String, List<List>> memo; private final List attackers; private final List blockers; private final List<List> combinations; // Store the combinations

    public CombinationGenerator(List<Permanent> attackers, List<Permanent> blockers) {
        this.memo = new HashMap<>();
        this.attackers = attackers;
        this.blockers = blockers;
        this.combinations = new ArrayList<>();  // Initialize the list for combinations
    }

    public List<List<Permanent>> generateCombinations() {
        List<List<Permanent>> blockerSubsets = generateBlockerSubsets();
        generateCombinations(attackers, blockerSubsets, 0, new ArrayList<>(), new HashSet<>());

        // Include combinations with individual attackers
        for (int i = 0; i < attackers.size(); i++) {
            List<Permanent> singleAttacker = Arrays.asList(attackers.get(i));
            generateCombinations(singleAttacker, blockerSubsets, 0, new ArrayList<>(), new HashSet<>());
        }

        return combinations;
    }

    private List<List<Permanent>> generateBlockerSubsets() {
        List<List<Permanent>> subsets = new ArrayList<>();
        generateBlockerSubsets(0, new ArrayList<>(), subsets);
        return subsets;
    }

    private void generateBlockerSubsets(int idx, List<Permanent> currentSubset, List<List<Permanent>> subsets) {
        if (idx == blockers.size()) {
            subsets.add(new ArrayList<>(currentSubset));
            return;
        }
        generateBlockerSubsets(idx + 1, currentSubset, subsets);

        currentSubset.add(blockers.get(idx));
        generateBlockerSubsets(idx + 1, currentSubset, subsets);
        currentSubset.remove(currentSubset.size() - 1);
    }

    private void generateCombinations(List<Permanent> attackers, List<List<Permanent>> blockerSubsets, int current, List<List<Permanent>> result, Set<Permanent> usedBlockers) {
        if (current == attackers.size()) {
            addCombinationToList(attackers, result);
            return;
        }

        for (List<Permanent> subset : blockerSubsets) {
            if (Collections.disjoint(subset, usedBlockers)) {
                String key = generateKey(current, subset);
                if (!memo.containsKey(key)) {
                    List<List<Permanent>> permutations = generatePermutations(new ArrayList<>(subset));
                    memo.put(key, permutations);
                }

                for (List<Permanent> permutation : memo.get(key)) {
                    result.add(permutation);
                    usedBlockers.addAll(permutation);
                    generateCombinations(attackers, blockerSubsets, current + 1, result, usedBlockers);
                    usedBlockers.removeAll(permutation);
                    result.remove(result.size() - 1);
                }
            }
        }
    }

    private List<List<Permanent>> generatePermutations(List<Permanent> subset) {
        List<List<Permanent>> permutations = new ArrayList<>();
        generatePermutations(subset, 0, permutations);
        return permutations;
    }

    private void generatePermutations(List<Permanent> arr, int index, List<List<Permanent>> result) {
        if (index >= arr.size() - 1) {
            result.add(new ArrayList<>(arr));
            return;
        }

        for (int i = index; i < arr.size(); i++) {
            Collections.swap(arr, index, i);
            generatePermutations(arr, index + 1, result);
            Collections.swap(arr, index, i);
        }
    }

    private String generateKey(int current, List<Permanent> subset) {
        return current + "-" + subset.toString();
    }

    private void addCombinationToList(List<Permanent> attackers, List<List<Permanent>> result) {
        List<Permanent> combination = new ArrayList<>();
        for (int i = 0; i < attackers.size(); i++) {
            combination.add(attackers.get(i));
            combination.addAll(result.get(i));
        }
        combinations.add(combination);
    }

    public static List<List<Permanent>> generateAllCombinations(List<Permanent> attackers, List<Permanent> blockers) {
        CombinationGenerator generator = new CombinationGenerator(attackers, blockers);
        return generator.generateCombinations();
    }
}

`