penta2019 / mahjong_server

日本リーチ麻雀のゲームサーバー
MIT License
16 stars 2 forks source link

A discussion on infomations that mjai relayed to AI. #5

Closed celie1 closed 3 years ago

celie1 commented 3 years ago

Hi, I noticed the akochan AI's decision making varies despite using same pt setting throughout each game.

Then I confirmed my observation by double checking game log using akoreviewer with the same pt distribution, https://github.com/Equim-chan/akochan-reviewer. Sometimes I'll get 99% total accuracy, but most of the time ako AI will make a few weird decisions and total accuracy will drop to 88~92%.

At first, I suspected it's due to the differences in system.exe, so I try to reproduce the same result with identical version system.exe from akoreviewer. And I found out that the problem still persisted.

weird An example of weird decision made by ako AI in game while the identical ako AI used by the reviewer with similar pt value evaluate this as the worst decision possible.

I hate to bother you with this issue since I'm not sure what exactly caused this problem, but I'm at my wits' end here. Thank you for your time.

penta2019 commented 3 years ago

I also see the AI sometimes makes a decision which looks weird at first glance. But those decisions I found were reasonable considering the score situation. The highest pt expected value is not always the best choice especially if you have a big score advance. In the example you presented, you had enough score (high pt) to keep your rank and your hand was very far from complete. The AI may have considered that it was better to pretend to be near complete to encourage other players to rush cheap hands rather than to complete own hand.

celie1 commented 3 years ago

The highest pt expected value should be the best choice in ako's case since it already took in the expected pt value after deal-in "放銃", it's the combination of the probability of dealt-in and pass through.

In the example, akoreviewer evaluated 5sr to be extremely dangerous “25.3% 銃率”,which translate to a 1 in 4 chance to dealt-in"放銃". But if it passed successfully, the expected pt value will be 84.14pt. But since it's dora, the loss of dealt-in will be larger so the expected pt after dealt-in"放銃" will be lower, at 76.52pt. In total, it's overall pt expected value is at 82.21, the lowest.

The loss will far outweighs the gain for one to pretend to be near complete and accidentally dealt-in"放銃" to other players in the process. Ako AI also know the tactic of purposely dealt-in"差し込み", but since the expected pt after dealt-in"放銃" is lower, 76.52pt, it would not make any sense to purposely dealt-in"差し込み" to other players.

If I would offer my perspective as a 6段 player in tenhou, most if not all of the AI decisions that deviated from what were suggested by akoreviewer are undoubtedly wrong. I estimate akoreviewer's evaluation strength is on par with a player that is of the rank of 7段R2000+ or 魂天, while the current AI setup will at most achieve 4段R1800+ or 雀豪2。

penta2019 commented 3 years ago

OK. I agreed on your points. Now, I'm looking into the source code of the reviewer to find out what is actually making the differences. I think it will take a while to identify the cause and fix the problem.

celie1 commented 3 years ago

Thank you for looking into this issue.

On a side note, I noticed the "tactics.json" used by akoreviewer is interchangable with "setup_mjai.json" in akochan ai but there is some addition settings in "tactics.json" of akoreviewer, it might not be relevant to the problem though I think it's worthwhile to mention that.

penta2019 commented 3 years ago

That seems almost the answer. Have you tried it? Passing the path to "tactics.json" as a 3rd argument is also possible. $ akochan mjai_client 11601 ../akochan-reviewer/tactics.json

celie1 commented 3 years ago

Yes, I have tried it for roughly 40 matches+ a few weeks ago, besides from passing the path, I also tried to directly copy all lines in "tactics.json" to replace "setup_mjai.json". Unfortunately ako ai just seem to ignore those additions command lines in "tactics.json" and working as usual in both methods.

Further inspection is needed but currently I believe those addition lines are meant for akoreviewer.exe, so perhaps by finding out exactly what setting did those addition lines affects, we can try to implement them directly into ako's ai.

penta2019 commented 3 years ago

No. That is not true. The reviewer does nothing special about tactics. It just pass the path to "tactics.json" to system.exe as it shows with -v (verbose) option. Screenshot_20210709_034029 the 3rd option "2" is player id, and the 2nd option "pipe_detailed" means input from stdin and output to stdout with detailed information.

The only property the reviewer reads from "tactics.json" is "jun_pt", and it is just for showing in a browser.

penta2019 commented 3 years ago

I found a big difference, and it may be the cause of the problem. MjaiEndpoint uses mjai protocol version 1 internally while the reviewer uses mjai protocol version 3. The "scores" field has been moved from "Hora" message to "StartKyoku" message. I think it's better design.

celie1 commented 3 years ago

You are right. I deduced the addition lines did not work on ako ai earlier because it still makes some decisions that deviates from the evaluation of akoreviewer.

celie1 commented 3 years ago

Hi, I tried the fix #5 and I noticed while it can perform brilliantly with a perfect score in East-only match(東風戦) but it still makes questionable decisions in Hanchan match(半荘戦).

Perhaps mjai didn't relay the match type East-only match(東風戦) vs Hanchan match(半荘戦) properly to ako ai? Since the strategy will have to change accordingly depending on the match type.