jonthysell / Mzinga

Open-source software to play the board game Hive.
MIT License
80 stars 9 forks source link

Instructions for training #106

Open akaAMAZING opened 3 years ago

akaAMAZING commented 3 years ago

Hi there,

As an avid Hive player and enthusiast I love and support what you are doing here. I've always wondered how AlphaZero would approach Hive ^^

I have played a few games against the Mzinga AI and it is rather weak at the moment. I was wondering whether you could breakdown how the training works and the correct commands and procedure to use in order to make the AI a little stronger. I have a 16 core processor and would love to train new engines for this.

I have tried to make it do what I want it to via the cmd but unfortunately I am not savvy enough to get it to work.

jonthysell commented 2 years ago

I haven't gotten around to writing up good instructions for using MzingaTrainer, but here's what I sent the last person who asked me:

When I was first trying to find the best numbers, I generated a folder of various profiles and used the lifecycle command with the following script:

rm %~dp0trainer.log
MzingaTrainer.exe lifecycle -pp "%~dp0profiles" -lg 3 -lb 1 -ckc 50 -msp false -mpc 10 -mminm -1.0 -mmaxm 10000.0 -tmt 00:00:05 -btl 00:30:00 -pgc 100 -bbtl 12:00:00 -bsp true -mcb 8 -fpc false -gt "Base" | tee -a %~dp0trainer.log

Lifecycle makes the profiles battle each other for a cycle, then the "weakest" profiles are culled from the herd and the "strongest" are allowed to "mate", generating new profiles with numbers mixed from their parents. Mechanisms are in place to make sure profiles have a provisional "childhood" period where they can't be culled and can't mate until they've fought sufficient battles.

It was a slow process, so eventually I switched to using autotrain. I exported the latest built-in numbers from MzingaTrainer using exportai command, then tweaked the name, id, and filename of each (since autotraining overwrites the numbers, and ids are based on the version number). That way each could be considered a new profile (and not conflict with a normal exportai file). Then I wrote a script to train each of these profiles (exportai generates one per game type) against its gametype:

@echo off
setlocal

for /L %%N in () do (
    call :at %~dp000080000-000a-0000-0000-000000000000.xml "Base"
    call :at %~dp000080001-000a-0000-0000-000000000000.xml "Base+M"
    call :at %~dp000080002-000a-0000-0000-000000000000.xml "Base+L"
    call :at %~dp000080003-000a-0000-0000-000000000000.xml "Base+ML"
    call :at %~dp000080004-000a-0000-0000-000000000000.xml "Base+P"
    call :at %~dp000080005-000a-0000-0000-000000000000.xml "Base+MP"
    call :at %~dp000080006-000a-0000-0000-000000000000.xml "Base+LP"
    call :at %~dp000080007-000a-0000-0000-000000000000.xml "Base+MLP"
    timeout 30
)

goto :eof

:at
echo Starting %2
copy /y %1 %1.bak
%~dp0\MzingaTrainer.exe autotrain -tpp %1 -tmt 00:00:05 -btl 00:25:00 -mb 8 -mht 3 -tts 64 -gt %2
exit /b

goto :eof

endlocal

Then, after weeks of letting these run, I would test these new profiles using the battleroyale command against a pool of profiles that includes the exportai from previous versions of Mzinga. At the end of the battle royale (once per game type) I would use the mergetop command to get the top profiles from each gametype and generate one config file I could copy into Mzinga for the next release.

@echo off
setlocal

set OD=/path/where/autotrained/profiles/are
set LOG=%~dp0brawl.log

set MT=%~dp0MzingaTrainer.exe

set BACKUPS=%~dp0autotrained
set PROFILES=%~dp0profiles

rm %LOG%

rem for /L %%N in () do (
    copy /Y %OD%\*.bak "%BACKUPS%\" | tee -a %LOG%
    copy /Y %OD%\*.xml "%PROFILES%\" | tee -a %LOG%
    %MT% battleroyale -pp "%PROFILES%" -tmt 00:00:05 -btl 00:05:00 -bbtl 1.00:00:00 -bsp true -mcb 6 -fpc true -agt true | tee -a %LOG%
    %MT% mergetop -pp "%PROFILES%" | tee -a %LOG%
rem )

endlocal

You can see my script uses a command called "tee" which lets you direct the output from the trainer to both the console output and to a log file simultaneously. That way I could open the log file and review how things were going without having to scroll the limited console window, since these operations can take days or weeks dependending on your parameters.