mokemokechicken / reversi-alpha-zero

Reversi reinforcement learning by AlphaGo Zero methods.
MIT License
678 stars 170 forks source link

About the optimizer? #27

Open wjllance opened 6 years ago

wjllance commented 6 years ago
  1. I found that the optimizer only load data at the beginning, will it reload new play data in the training progress? 2.Hope more log can be available, such as loss with step
wjllance commented 6 years ago

better divide log into different file, haha

mokemokechicken commented 6 years ago

Hi @wjllance

  1. I found that the optimizer only load data at the beginning, will it reload new play data in the training progress?

The optimizer reloads new play data at here.

2.Hope more log can be available, such as loss with step better divide log into different file,

Surely. I think so too :)

wjllance commented 6 years ago

So when the trainer pick a batch, it will pick from all the old data, without ignoring the very begging play data, right? If we can produce more self play data, maybe it's a better way to select from most recent data, does it make sense?

mokemokechicken commented 6 years ago

So when the trainer pick a batch, it will pick from all the old data, without ignoring the very begging play data, right?

Right. Trainer picks up all data.

If we can produce more self play data, maybe it's a better way to select from most recent data, does it make sense?

The old data will be removed by self-play at here.

config.play_data.max_file_num decides how many old data are remained.

wjllance commented 6 years ago

oh thx, you're so nice~