Closed y-kkky closed 4 years ago
What parameter should I tune for real training? As I understand there are 4 parameters to play with:
How to decide which should I sacrifice?
Do you train AlphaGo that is described in chapter 13? If completed,can you share the model files?
@computer-idol AlphagoZero from chapter 14
Steps I've done to start training ZeroAgent for 9x9 board size:
At first, I've put my model in a separate file:
Model
def zero_model(board_size): encoder = zero.ZeroEncoder(board_size) board_input = Input(shape=encoder.shape(), name='board_input') pb = board_input for i in range(4): pb = Conv2D(64, (3, 3), padding='same', data_format='channels_first')(pb) pb = BatchNormalization(axis=1)(pb) pb = Activation('relu')(pb) policy_conv = Conv2D(2, (1, 1), data_format='channels_first')(pb) policy_batch = BatchNormalization(axis=1)(policy_conv) policy_relu = Activation('relu')(policy_batch) policy_flat = Flatten()(policy_relu) policy_output = Dense(encoder.num_moves(), activation='softmax')( policy_flat) value_conv = Conv2D(1, (1, 1), data_format='channels_first')(pb) value_batch = BatchNormalization(axis=1)(value_conv) value_relu = Activation('relu')(value_batch) value_flat = Flatten()(value_relu) value_hidden = Dense(256, activation='relu')(value_flat) value_output = Dense(1, activation='tanh')(value_hidden) model = Model( inputs=[board_input], outputs=[policy_output, value_output]) return modelThen, I initialized my ZeroAgent this way:
I created play_train_eval_zero.py by the example of other play_train_eval.py s: https://pastebin.com/HUHnYWBX
Example configuration:
--agent original_zero.h5 --num-workers 6 --games-per-batch 500 --board-size 9 --games-eval 60
And what I see is the degradation of my bot during learning:
Reference: original_zero.h5 Learning_agent: original_zero.h5 Total games so far 0 Won 33 / 60 games (0.550) New reference is agent_00000500.hdf5 Reference: agent_00000500.hdf5 Learning_agent: agent_cur.hdf5 Total games so far 500 Won 25 / 60 games (0.417) Reference: agent_00000500.hdf5 Learning_agent: agent_cur.hdf5 Total games so far 1000 Won 10 / 60 games (0.167) Reference: agent_00000500.hdf5 Learning_agent: agent_cur.hdf5 Total games so far 1500 Won 11 / 60 games (0.183)I tried different configurations and it is a common pattern for all of them and I have no idea.
I had some assumptions:
The last assumption is that I do very small amount of simulations, but I'm afraid to run this on paid GPU server, to check this assumption - cause it is very strange pattern that bot becomes weaker with each round.