koaning / sushigo

An OpenAi-like environment for the sushi go card game.
MIT License
3 stars 1 forks source link

calculating rewards #7

Open koaning opened 7 years ago

koaning commented 7 years ago

The interesting thing about this game is that only after a round it will be known what the true score will be.

Take the maki cards or the pudding cards. Only at the end of a round will you know how many points you will have gotten.

At the moment there is a huge bug in the way scores are calculated. If at turn 2 you have the most maki, you get all the maki points even though you should only get this at the end of the turn. I can fix this ... but we need to agree on what reward I send you back. Do we ignore maki/pudding cards for intermediate steps?

koaning commented 7 years ago

@RobRomijnders I just discovered this bug. The end scores should still be correct but the intermediate rewards are simply not correct.

RobRomijnders commented 7 years ago

Yes, I would only include the points which are definite

koaning commented 7 years ago

then the reward over time could look something like

0,0,5,6,6,6,6,6,18

seems fine by me

koaning commented 7 years ago

@RobRomijnders

I am in favour of having a single reward function given from the game. Are we fine with this definition of reward and having that be our single reward signal?

I am cool with players interpreting the reward differently but any deviating reward would need to be handled by the player object.

RobRomijnders commented 7 years ago

I agree that we strive for simplicity. But an algorithms optimized for the highest points, doesn't necessarily equal the winning algorithm. (Like you pointed out yourself last Thursday)

So somewhere and somehow, the player must be informed who won the game.

koaning commented 7 years ago

Highest points in a round does not indicate a win. But the highest points at the end of all rounds does indicate a clear winner. We could still add a win flag though, but it will merely check "is the game over?" and "did this player have the most points?"

RobRomijnders commented 7 years ago

Yes, but a player's single point count doesn't tell him if he win or not. If player A receives reward=70 he cannot know if that's the highest.