maxpumperla / deep_learning_and_the_game_of_go

Code and other material for the book "Deep Learning and the Game of Go"
https://www.manning.com/books/deep-learning-and-the-game-of-go
989 stars 390 forks source link

ZeroAgent becomes weaker after learning #70

Closed y-kkky closed 4 years ago

y-kkky commented 4 years ago

Steps I've done to start training ZeroAgent for 9x9 board size:

At first, I've put my model in a separate file:

Model def zero_model(board_size): encoder = zero.ZeroEncoder(board_size) board_input = Input(shape=encoder.shape(), name='board_input') pb = board_input for i in range(4): pb = Conv2D(64, (3, 3), padding='same', data_format='channels_first')(pb) pb = BatchNormalization(axis=1)(pb) pb = Activation('relu')(pb) policy_conv = Conv2D(2, (1, 1), data_format='channels_first')(pb) policy_batch = BatchNormalization(axis=1)(policy_conv) policy_relu = Activation('relu')(policy_batch) policy_flat = Flatten()(policy_relu) policy_output = Dense(encoder.num_moves(), activation='softmax')( policy_flat) value_conv = Conv2D(1, (1, 1), data_format='channels_first')(pb) value_batch = BatchNormalization(axis=1)(value_conv) value_relu = Activation('relu')(value_batch) value_flat = Flatten()(value_relu) value_hidden = Dense(256, activation='relu')(value_flat) value_output = Dense(1, activation='tanh')(value_hidden) model = Model( inputs=[board_input], outputs=[policy_output, value_output]) return model

Then, I initialized my ZeroAgent this way:

encoder = zero.ZeroEncoder(9)
model = zero_model(9)
agent = zero.ZeroAgent(model, encoder, rounds_per_move=50, c=2.0)
with h5py.File('original_zero.h5', 'w') as outf:
    agent.serialize(outf)

I created play_train_eval_zero.py by the example of other play_train_eval.py s: https://pastebin.com/HUHnYWBX

Example configuration: --agent original_zero.h5 --num-workers 6 --games-per-batch 500 --board-size 9 --games-eval 60

And what I see is the degradation of my bot during learning: Reference: original_zero.h5 Learning_agent: original_zero.h5 Total games so far 0 Won 33 / 60 games (0.550) New reference is agent_00000500.hdf5 Reference: agent_00000500.hdf5 Learning_agent: agent_cur.hdf5 Total games so far 500 Won 25 / 60 games (0.417) Reference: agent_00000500.hdf5 Learning_agent: agent_cur.hdf5 Total games so far 1000 Won 10 / 60 games (0.167) Reference: agent_00000500.hdf5 Learning_agent: agent_cur.hdf5 Total games so far 1500 Won 11 / 60 games (0.183)

I tried different configurations and it is a common pattern for all of them and I have no idea.

I had some assumptions:

  1. That somehow I copy the wrong file after training and my reference becomes stronger while learning_agent remains the same. It is not true - I added an additional step to save files after each round if learning_agent lost evaluation, and checked hashes of model files - they were the same for reference_agent (as it should be), so it is not the case.
  2. I have also checked manually that encoder works fine, also that experience collector works fine, looks like they are ok. UPD. I found out that I'm using komi 7.5 for 9x9 board, but maybe this is ok

The last assumption is that I do very small amount of simulations, but I'm afraid to run this on paid GPU server, to check this assumption - cause it is very strange pattern that bot becomes weaker with each round.

y-kkky commented 4 years ago

What parameter should I tune for real training? As I understand there are 4 parameters to play with:

  1. model
  2. encoder
  3. games per batch
  4. rounds per move

How to decide which should I sacrifice?

computer-idol commented 4 years ago

Do you train AlphaGo that is described in chapter 13? If completed,can you share the model files?

y-kkky commented 4 years ago

@computer-idol AlphagoZero from chapter 14