三人麻雀用のAIです。
自然言語モデルであるBERTを利用して構築しています。
モデルはMasked Language Modelで事前学習してから、Policy Value Networkの学習という手順を踏みます。
強化学習は行っていません。
本プログラムは三人麻雀ルールに対応したmjaiプロトコルでの利用を想定しています。
下記のforkを利用してください。
天鳳/雀魂が採用している抜きドラ(北)ルールに対応するため、"type":"nukidora"`を導入しています。
(<-) Server to Client, (->) Client to Server
<- {"type":"tsumo","actor":0,"pai":"C"}
-> {"type":"nukidora","actor":0,"pai":"N"}
<- {"type":"nukidora","actor":0,"pai":"N"}
-> {"type":"none"}
<- {"type":"tsumo","actor":0,"pai":"E"}
-> {"type":"dahai","actor":0,"pai":"E","tsumogiri":true}
<- {"type":"dahai","actor":0,"pai":"E","tsumogiri":true}
-> {"type":"none"}
Google Colaboratory TPUで7日かけて学習しました。
学習データ
テストデータ
--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_accuracy': 0.7972599864006042, 'test_loss': 0.5289502143859863}
--------------------------------------------------------------------------------
0 0 0 0.0
1 924221 1008436 0.7901532670392568
2 0 0 0.0
3 0 0 0.0
4 0 0 0.0
5 0 0 0.0
6 0 0 0.0
7 0 0 0.0
8 0 0 0.0
9 923944 981816 0.8000052963080658
10 31774 30244 0.7062888506811268
11 591399 570972 0.7586834380670155
12 399405 372718 0.7584608202447963
13 271825 247356 0.7592215268681576
14 232758 195329 0.7641824818639321
15 186057 183558 0.7017781845520217
16 231263 231702 0.7009175578976444
17 273533 261344 0.7425615281008938
18 399271 422965 0.7114654876881066
19 593032 579594 0.7522921217265879
20 32571 28184 0.7395685495316492
21 597560 558024 0.7723359568764068
22 406378 383433 0.755300143701768
23 277395 247779 0.7668648271241711
24 235896 210529 0.745768991445359
25 189512 164245 0.7473286858047429
26 236268 202835 0.75676781620529
27 276652 269120 0.7344307372175981
28 403260 372343 0.7646739699685505
29 599142 570094 0.7645616336954958
30 775968 834380 0.7506711570267743
31 797026 806371 0.7792666154908845
32 851573 893626 0.7757674910980656
33 22125 14497 0.730082085948817
34 747669 708730 0.7996246807670058
35 746071 759753 0.7723444329933544
36 750832 823257 0.7449678532949007
37 339661 367860 0.7343581797422932
38 427234 424245 0.8522433970936605
39 3071 923 0.6728060671722643
40 16318 20094 0.6654722802826715
41 31115 34469 0.7157155705126346
42 1178109 1210985 0.9567269619359448
43 390583 393297 0.992595926233864
44 13047 15005 0.8504498500499833
45 1115650 1119056 0.9390620308545774
Class | Token | Count | Offset | Range | Multiply | positional embedding | Note |
---|---|---|---|---|---|---|---|
Special | [PAD] | 1 | 0 | [0...0] | * | - | - |
[CLS] | 1 | 1 | [1...1] | 1 | - | - | |
[SEP] | 1 | 2 | [2...2] | 1 | - | - | |
[EOS] | 1 | 3 | [3...3] | 1 | - | - | |
[MASK] | 1 | 4 | [4...4] | 0..1 | - | - | |
[UNK] | 1 | 5 | [5...5] | 0 | - | - | |
Category | style | 2 | 6 | [6...7] | 1 | - | 東風[0] 半荘[1] |
player_id(absolute) | 3 | 8 | [8...10] | 1 | - | 東家[0],南家[1],西家[2] | |
bakaze | 3 | 11 | [11...13] | 1 | - | 東場[0], 南場[1], 西場[2] | |
kyoku | 3 | 14 | [14...16] | 1 | - | [0,1,2] | |
honba | 4 | 17 | [17...20] | 1 | - | min(honba, 4) | |
kyotaku | 3 | 21 | [21...23] | 1 | - | min(kyotaku 3) | |
Numeric | delta_score(自家 - 上家) | 97 | 24 | [24...120] | 1 | - | clip((delta_score/1000) + 48, 0, 96) |
delta_score(自家 - 下家) | 97 | 121 | [121...217] | 1 | - | clip((delta_score/1000) + 48, 0, 96) | |
num_pipais | 12 | 218 | [218...229] | 1 | - | clip(num_pipais, N) | |
Pai | dora_markers | 37 | 230 | [230...266] | 1..5 | - | tile37 multiply=1..5 |
tehai | 37 | 267 | [267...303] | 1..14 | - | tile136, (副露牌を含めない打牌可能な手牌. 自摸牌は含む.) | |
tsumo(自摸牌) | 37 | 304 | [304...340] | 0..1 | - | tile37, (直前のtsumoでツモった牌.dahai後は空.) | |
possible | can_dahai | 1 | 341 | [341...341] | 0..1 | - | |
can_reach | 1 | 342 | [342...342] | 0..1 | |||
can_hora | 1 | 343 | [343...343] | 0..1 | - | ||
can_ryukyoku | 1 | 344 | [344...344] | 0..1 | - | ||
can_pon | 1 | 345 | [345...345] | 0..1 | - | ||
can_daiminkan | 1 | 346 | [346...346] | 0..1 | - | ||
can_ankan | 1 | 347 | [347...347] | 0..1 | - | ||
can_kakan | 1 | 348 | [348...348] | 0..1 | - | ||
Player0 | (player0)dahai | 74 | 349 | [349...422] | * | ✔ | tile37 * 2(tsumogiri = False[0..36], tsumogiri = True[37..73]) |
(relative) | reach | 1 | 423 | [423...423] | * | ✔ | - |
pon | 37 | 424 | [424...460] | * | ✔ | tile37 | |
daiminkan | 34 | 461 | [461...494] | * | ✔ | tile34 | |
ankan | 34 | 495 | [495...528] | * | ✔ | tile34 | |
kakan | 34 | 529 | [529...562] | * | ✔ | tile34 | |
nukidora | 1 | 563 | [563...563] | * | ✔ | - | |
Player1 | dahai | 74 | 564 | [564...637] | * | ✔ | (Player0と同じ) |
(relative) | reach | 1 | 638 | [638...638] | * | ✔ | |
pon | 37 | 639 | [639...675] | * | ✔ | ||
daiminkan | 34 | 676 | [676...709] | * | ✔ | ||
ankan | 34 | 710 | [710...743] | * | ✔ | ||
kakan | 34 | 744 | [744...777] | * | ✔ | ||
nukidora | 1 | 778 | [778...778] | * | ✔ | ||
Player2 | dahai | 74 | 779 | [779...852] | * | ✔ | (Player0と同じ) |
(relative) | reach | 1 | 853 | [853...853] | * | ✔ | |
pon | 37 | 854 | [854...890] | * | ✔ | ||
daiminkan | 34 | 891 | [891...924] | * | ✔ | ||
ankan | 34 | 925 | [925...958] | * | ✔ | ||
kakan | 34 | 959 | [959...992] | * | ✔ | ||
nukidora | 1 | 993 | [993...993] | * | ✔ |
Class | Token | Count | Offset | Range | - | Note | |
---|---|---|---|---|---|---|---|
Actual action | dahai | 37 | 0 | [0...36] | - | ||
reach | 1 | 37 | [37...37] | - | - | ||
pon | 1 | 38 | [38...38] | - | |||
daiminkan | 1 | 39 | [39...39] | - | |||
ankan | 1 | 40 | [40...40] | - | tile34 | ||
kakan | 1 | 41 | [41...41] | - | tile34 | ||
nukidora | 1 | 42 | [42...42] | - | - | ||
hora | 1 | 43 | [43...43] | - | |||
ryukyoku | 1 | 44 | [44...44] | - | - | ||
none(skip) | 1 | 45 | [45...45] | - | - |
Augmentationとして下記の変換を行います。