Nothing to say about the hardcoded rules and evoved one.
I appreciate the minmax implementation.
The reinforcement learning strategy is well written(I like a lot the generation at runtime of the possible states instead of complete enumeration), but probably, only encoding the state doesn't give good result. I made the same mistake, but later, moving to a score associated to the pair (state, action) i was able to achieve good result.
It was a good idea also to make some hyperparameter tuning, showing the analysis you have made.
Nothing to say about the hardcoded rules and evoved one.
I appreciate the minmax implementation.
The reinforcement learning strategy is well written(I like a lot the generation at runtime of the possible states instead of complete enumeration), but probably, only encoding the state doesn't give good result. I made the same mistake, but later, moving to a score associated to the pair (state, action) i was able to achieve good result.
It was a good idea also to make some hyperparameter tuning, showing the analysis you have made.