nestedsoftware / blog_comments

Comments for https://nestedsoftware.com using utterances
0 stars 0 forks source link

Tic-Tac-Toe with a Neural Network #4

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

Tic-Tac-Toe with a Neural Network

Let’s train a Tic-Tac-Toe player with a neural network via reinforcement learning using PyTorch

https://nestedsoftware.com/2019/12/27/tic-tac-toe-with-a-neural-network-1fjn.206436.html

krishenm94 commented 3 years ago

Great work! I have a question though. In qneural.py line 156

        next_q_values = get_q_values(next_position, net_context.target_net)

during backpropagation, if we are indeed implementing architecture faithful to DQN, shouldn't we use the policy_net's values here.

It seems like this is a modification in-line with Double DQN, which is incidentally, superior in terms of curbing overoptimism Let me know if this was intentional. I'm trying to understand your code :).

nestedsoftware commented 3 years ago

Yes, the idea was to try implementing double-dqn as mentioned in the DeepMind paper linked in the references. I seem to recall that at the time I was writing this code, I did try removing target_net, without that making a substantial change to the result. If you're doing further study in this area, it would be interesting to investigate when this type of thing is more likely to result in the 'divergent' behaviour referred to in the paper.

krishenm94 commented 3 years ago

I would suspect that's because the time delay between the target and online nets in your implementation is only one game. If we increase the delay we might see performance differences

bhouston commented 2 weeks ago

Great article!

I think that your training set of 2M runs is too much though. There are is less than 9! (=362,880) possible unique games of tic-tac-toe, (e.g. 9 spots where one can incremental fill-in.). Thus you most likely just trained the network to completely fit the training data.

It may be that this cannot be avoided with the simplicity of the tic-tac-toe game and its fully deterministic results?

nestedsoftware commented 2 weeks ago

That's an excellent point. I haven't worked with this project in a long time, but it would be interesting to run through all possible games and compare moves against those of a minimax player. As you say, it's rather difficult to avoid this problem for tic tac toe. My goal at the time was to teach myself how to write a basic reinforcement-learning agent using a neural network, so tic tac toe seemed like a nice simple case study to do that with - i.e. didn't require much compute and it was fairly easy to work with compared to a less trivial problem domain.