yuufyu / wahaha

三人麻雀用AI
MIT License
14 stars 2 forks source link

wahaha

三人麻雀用のAIです。
自然言語モデルであるBERTを利用して構築しています。
モデルはMasked Language Modelで事前学習してから、Policy Value Networkの学習という手順を踏みます。
強化学習は行っていません。

Enviroment

Mjai protocol

本プログラムは三人麻雀ルールに対応したmjaiプロトコルでの利用を想定しています。
下記のforkを利用してください。

天鳳/雀魂が採用している抜きドラ(北)ルールに対応するため、"type":"nukidora"`を導入しています。

(<-) Server to Client, (->) Client to Server

<-  {"type":"tsumo","actor":0,"pai":"C"}
->  {"type":"nukidora","actor":0,"pai":"N"}
<-  {"type":"nukidora","actor":0,"pai":"N"}
->  {"type":"none"}
<-  {"type":"tsumo","actor":0,"pai":"E"}
->  {"type":"dahai","actor":0,"pai":"E","tsumogiri":true}
<-  {"type":"dahai","actor":0,"pai":"E","tsumogiri":true}
->  {"type":"none"}

Performance

Google Colaboratory TPUで7日かけて学習しました。

Encoder specification

Train data

Class Token Count Offset Range Multiply positional embedding Note
Special [PAD] 1 0 [0...0] * - -
[CLS] 1 1 [1...1] 1 - -
[SEP] 1 2 [2...2] 1 - -
[EOS] 1 3 [3...3] 1 - -
[MASK] 1 4 [4...4] 0..1 - -
[UNK] 1 5 [5...5] 0 - -
Category style 2 6 [6...7] 1 - 東風[0] 半荘[1]
player_id(absolute) 3 8 [8...10] 1 - 東家[0],南家[1],西家[2]
bakaze 3 11 [11...13] 1 - 東場[0], 南場[1], 西場[2]
kyoku 3 14 [14...16] 1 - [0,1,2]
honba 4 17 [17...20] 1 - min(honba, 4)
kyotaku 3 21 [21...23] 1 - min(kyotaku 3)
Numeric delta_score(自家 - 上家) 97 24 [24...120] 1 - clip((delta_score/1000) + 48, 0, 96)
delta_score(自家 - 下家) 97 121 [121...217] 1 - clip((delta_score/1000) + 48, 0, 96)
num_pipais 12 218 [218...229] 1 - clip(num_pipais, N)
Pai dora_markers 37 230 [230...266] 1..5 - tile37 multiply=1..5
tehai 37 267 [267...303] 1..14 - tile136, (副露牌を含めない打牌可能な手牌. 自摸牌は含む.)
tsumo(自摸牌) 37 304 [304...340] 0..1 - tile37, (直前のtsumoでツモった牌.dahai後は空.)
possible can_dahai 1 341 [341...341] 0..1 -
can_reach 1 342 [342...342] 0..1
can_hora 1 343 [343...343] 0..1 -
can_ryukyoku 1 344 [344...344] 0..1 -
can_pon 1 345 [345...345] 0..1 -
can_daiminkan 1 346 [346...346] 0..1 -
can_ankan 1 347 [347...347] 0..1 -
can_kakan 1 348 [348...348] 0..1 -
Player0 (player0)dahai 74 349 [349...422] * tile37 * 2(tsumogiri = False[0..36], tsumogiri = True[37..73])
(relative) reach 1 423 [423...423] * -
pon 37 424 [424...460] * tile37
daiminkan 34 461 [461...494] * tile34
ankan 34 495 [495...528] * tile34
kakan 34 529 [529...562] * tile34
nukidora 1 563 [563...563] * -
Player1 dahai 74 564 [564...637] * (Player0と同じ)
(relative) reach 1 638 [638...638] *
pon 37 639 [639...675] *
daiminkan 34 676 [676...709] *
ankan 34 710 [710...743] *
kakan 34 744 [744...777] *
nukidora 1 778 [778...778] *
Player2 dahai 74 779 [779...852] * (Player0と同じ)
(relative) reach 1 853 [853...853] *
pon 37 854 [854...890] *
daiminkan 34 891 [891...924] *
ankan 34 925 [925...958] *
kakan 34 959 [959...992] *
nukidora 1 993 [993...993] *

Label

Class Token Count Offset Range - Note
Actual action dahai 37 0 [0...36] -
reach 1 37 [37...37] - -
pon 1 38 [38...38] -
daiminkan 1 39 [39...39] -
ankan 1 40 [40...40] - tile34
kakan 1 41 [41...41] - tile34
nukidora 1 42 [42...42] - -
hora 1 43 [43...43] -
ryukyoku 1 44 [44...44] - -
none(skip) 1 45 [45...45] - -

Data Augmentation

Augmentationとして下記の変換を行います。