moonbings / connect6_rl

Connect6 AI based on reinforcement learning
MIT License
13 stars 1 forks source link

Questions related to state generation #3

Open EnJiang opened 4 years ago

EnJiang commented 4 years ago

Hi!

The code for generating state is as follows:

    def pre_processing(self, state, player, stone):
        if player == self.black:
            state1 = (state == self.black).astype(int)[:, :, np.newaxis]
            state2 = (state == self.white).astype(int)[:, :, np.newaxis]
        else:
            state1 = (state == self.white).astype(int)[:, :, np.newaxis]
            state2 = (state == self.black).astype(int)[:, :, np.newaxis]
        if stone == 2:
            state3 = np.ones(self.board_size)[:, :, np.newaxis]
            state4 = np.zeros(self.board_size)[:, :, np.newaxis]
        else:
            state3 = np.zeros(self.board_size)[:, :, np.newaxis]
            state4 = np.ones(self.board_size)[:, :, np.newaxis]
        state5 = np.ones(self.board_size)[:, :, np.newaxis]

        prep_state = np.concatenate((state1, state2, state3, state4, state5), axis=-1)
        return prep_state

I am a little bit confused about the functions of state3, state4 & state5. If state3 & state4 are used to indicate the current player, I think one layer (ones for black, zeros for white) would be sufficient. Also, state 5 seems redundant. Did I miss something?

P.S. I wonder, would put the bound map and the threat map at state 4 and state 5 helps the training...?

EnJiang commented 4 years ago

I did miss something...so the state3 & state4 are used to indicate the current stone(2 remains or 1 remains). But I am still confused about using two states to indicate that... Other question remains.

moonbings commented 4 years ago

Hi EnJiang!

State3 & state4 mean the number of stones currently remaining (2 remains or 1 remains). In fact, There was no reason to use the two states.

But, state5 may be meaningful. This feature represents the area of the board. In other words, this feature allows the model to understand board boundary when this feature passes through the convolutional layer with zero padding.