Closed laktionov closed 1 year ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Hi, thank you for your question.
The main reason for switching is an exception after an attempt to use the old versions of Mujoco envs:
DeprecatedEnv: Environment version v0 for Ant is deprecated. Please use Ant-v4 instead.
Speaking about the possible issues:
HalfCheetah-v4
, I haven't noted any differences in reward or observation design.
https://gymnasium.farama.org/environments/mujoco/half_cheetah/Ant-v4
which differs from Ant-v0
at least in the reward design ( _contactcost is excluded on Ant-v4
). Probably we should adjust the reward threshold.
https://gymnasium.farama.org/environments/mujoco/ant/I'm somewhat confused as I don't see any thresholds in the notebook -- which thresholds are you referring to?
Sorry, I meant these thresholds to fully complete the assignments:
In ppo.ipynb
In one million of interactions it should be possible to achieve the total raw reward of about 1500
In hw-continuous-control_pytorch.ipynb
Your goal is to reach at least 1000 average reward during evaluation after training in this ant environment (since this is a new hometask, this threshold might be updated, so at least just see if your ant learned to walk in the rendered simulation)
I'm not sure what the reward was on v0 with TD3/SAC -- however, it's probably fine to submit the notebooks as-is if reward>1000 correlates with behavior that is much better than random.
Is reward>1000 indicative of the agent performing well?
I've checked SAC agent which achieves the reward of 1063, it seems to perform well based on the video recording.
Thanks!
done
toterminated or truncated
to iterate over envEnvRunner
usedone
equals toterminated or truncated
since next state comes from the next episode.pybullet-gym
withgymnasium[mujoco]
assert 0 < np.mean(is_dones) < 0.1
in hw-continuous-control_pytorch.ipynb sinceis_done
only equals toterminated
now