Description

Mainly added more game rules:

Flying General
- Any move that result in flying general is rejected by the environment and consequently the agent is penalized for making such illegal move.
Perpetual Check
- An agent cannot make the same check or "jiang" consecutively for more than THREE times. After making more than three the same type of check (i.e. at 4th iteration of such move), the player loses for violating this rule.
Jiang Reward
- Awards the player for making jiang moves (though the environment does not warn the player under jiang)
- You will notice that I have changed one of the existing test, test_env_step_reward. This is because I added jiang reward. I had to modify the moves so that they are only capturing without making jiang.

I have NOT added stalemate checking because this feels like something that the agent would learn as it gets smarter. For example, let's say an agent manages to stalemate the opponent but is not smart enough to capture the opponent general on its next move and instead breaks the stalemate. I feel like the agent should be more responsible, but at the same time I am not sure if it is the job of the environment to check for such things for the agents. Since we did not include features like condition checking and informing for checks ("jiangs") and termination of episode when no meaningful progress is made by the agent, I think we can leave out stalemate checking for now. Let me know what you guys think. I can always add it if we have good reasons to do so.

Type of change

CI (I added some basic unit tests for these new changes)