philsupertramp commented 3 years ago

[ ] #4
[ ] #12
[x] #5
[ ] #6
[ ] #7

philsupertramp commented 3 years ago

https://en.wikipedia.org/wiki/Deep_reinforcement_learning

philsupertramp commented 3 years ago

Either this whole topic is still above my head, or it's fairly simple. From what I understand now, we need to provide a certain api for the training algorithm to do two things:

Try out several moves at once
Provide a scoring function for a game state

A game state is the current board state, a list of figures at positions.

input

So I'd pass an array of 8x8 elements containing 0 for empty, 1-6 for ehite and 8-14 for black figures to the model. Or, precalculate max number potential moves in a turn, and provide them as input instead the 64 tiles.

But this raises the question:

do we need to train the model for each figure?
can we just use tiles instead?
and what's an agent? Can an agent be compiled out of multiple agents? It's still very confusing to me were the rules and decisions for this are made.

I guess we need to try that out. And research a bit more on that topic. Especially the nature of chess is a bit f'd up on this. Go/Checkers on the other hand are simpler due to the limited amount of rules, since all the "figures" are equal. Look into Q, specific Form of RDL (reinforced deep learning). Figure out the proper terminology and introduce it to the codebase/project. Extract the notebook content and build a module in ml directory.

output

Preferred would be to get the best possible move, so I would suggest using regular topology and return 4x1 vector containing start and end coordinates, if possible. or we could output index of chosen move to do so.

note: please answer raised questions above to document decisions for later

philsupertramp commented 3 years ago

The file https://github.com/philsupertramp/chess/blob/dev/ml/reinforced-deep-learning-openai.ipynb was created while watching a tutorial "deep reinforcement learning in 20 min" the video itself didn't really go beyond the bare description of parameters. Horrible, if you ask me. So we will need to walk through it again, write comments and describe findings. I have a vague understanding of what it does, but no idea yet how to use it with some other library/Api other than gym as input source.

philsupertramp commented 3 years ago

Great resources:

Intro reinfoced training: https://www.youtube.com/watch?v=0MNVhXEX9to
intro deep-reinforced-training: https://www.youtube.com/watch?v=IUiKAD6cuTA

philsupertramp commented 3 years ago

Check out ml/example. It implements a simple deep Q-learning algorithm, an agent class as well as an environment. This agent is able to learn to move towards a moving food source as well as avoiding a moving enemy. It's quite interesting to compare training times of the initial Q learning implementation with the deep q learning variant. Whereas the deep q variant requires little memory to train and can actually be limited to remember the last N actions, the training time grew exponential. The q variant on the other hand is using a fixed share of memory implemented as a lookup table to determine an action at an finite state.

Would be interesting to fully understand when to use q and when to use deep q variants.

philsupertramp commented 3 years ago

alright, seems like ml/main.py is somewhat working. Now we need to feed some better data in there, I'd suggest moving black randomly for now, then training on these random moves.

philipp-zettl commented 3 years ago

Why not just training a NN to predict value of current state?

philsupertramp commented 2 years ago

BC it's harder then anticipated to figure out how to actually obtain the data as well as use it – somehow. This is still a huge unknown thing for me..

philsupertramp / chess

Develop deep reinforcement model [Epic] #3

input

output