when trying to test the sample action and step, it alway return reward -10

richardhuo commented 9 months ago

env = XiangQiEnv() import os import time done = False env.reset()

action = env.action_space.sample() print ('action', action) obs, reward, done, info = env.step(action) print (obs, 'reward=', reward, 'done=', done) -------output----------- action 14364 this is the step [[ -9 -7 -5 -3 -1 -2 -4 -6 -8] [ 0 0 0 0 0 0 0 0 0] [ 0 -11 0 0 0 0 0 -10 0] [-16 0 -15 0 -14 0 -13 0 -12] [ 0 0 0 0 0 0 0 0 0] [ 0 0 0 0 0 0 0 0 0] [ 12 0 13 0 14 0 15 0 16] [ 0 10 0 0 0 0 0 11 0] [ 0 0 0 0 0 0 0 0 0] [ 8 6 4 2 1 3 5 7 9]] reward= -10.0 done= False

richardhuo commented 9 months ago

tried game_mode.py, looks perfect. thanks.

hojoungjang commented 9 months ago

Sounds good. Yes, game_mode.py should give you a good sense of an example game-play iterations. We don't have active development at the moment, but we still really appreciate people playing with our library and filing potential bugs for visibility and tracking. Thanks!

tanliyon / gym-xiangqi

when trying to test the sample action and step, it alway return reward -10 #133