Open ashok-arora opened 2 weeks ago
Hi Ashok,
This is a somewhat complicated issue. In the older versions of gym, this truncated
variable was stored inside info
(see https://github.com/kevslinger/DTQN/blob/79ccf8b548a2f6263b770e051b42ced2932137ee/run.py#L371). This new gym version with 5 returns values is a breaking change, and updating to it would mean one could no longer run DTQN on the original environments. At the same time, leaving it in its current state would mean you cannot run DTQN on "modern" (updated) environments.
With that in mind, there are a few ways we can proceed with this:
1.) Do nothing, and leave the repo in its current (outdated) state.
2.) Update the repo to account for the new version of gym, while keeping the paper
branch in its original, old form. Then experiments could be run on the main
branch with new environments, and the paper
branch with old environments.
3.) Create a new branch from either the main
branch and/or paper
branch to accommodate for new gym environments and allow people to switch between branches based on the version of their environment.
4.) Come up with a sort of gym wrapper to interface between the two styles of environments. We could either wrap the old environments so that they now return obs, reward, terminated, truncated, info
instead of obs, reward, done, info
, and then update the codebase accordingly, OR we could wrap the new environments so that they return obs, reward, done, info
and put truncated
inside the info
variable, the same as the old method.
Probably option 4 is the best way. Let me know what you think. I of course will welcome pull requests towards this issue
Thank you for the detailed response.
Yeah, the fourth option seems like the straightforward way since splitting branches may lead to confusion for future users. To maintain compatibility, we can pass in an arg
of maybe --gym old
(returns obs
, reward
, done
, info
) or --gym new
(returns obs
, reward
, terminated
, truncated
, info
) that handles it accordingly, with --gym old
being the default to maintain compatibility with the paper
branch.
On a sidenote, I am wondering how
https://github.com/kevslinger/DTQN/blob/79ccf8b548a2f6263b770e051b42ced2932137ee/run.py#L371
is able to calculate the maximum time that should be decided for the truncated
part. Would it make more sense to define the time limit ourselves depending upon the the velocity
and the distance between heaven
and hell
?
That sounds like a good idea, we can use the gym-version
flag to determine if we need to wrap the environment in a gym wrapper to shrink the size of the returned tuple from 5 arguments to 4.
As for the TimeLimit.truncated
part, that is a bit of old gym magic -- by supplying the max_episode_steps
argument when adding an environment to the gym register, the backend wraps the environment in a TimeLimitWrapper
which then ends the episode and sets TimeLimit.truncated: True
in the info
as part of the return of the step
function. For example, https://github.com/kevslinger/DTQN/blob/79ccf8b548a2f6263b770e051b42ced2932137ee/envs/__init__.py#L47 means the episode will always end after 200 steps. I don't think we need to do anything to change that
That sounds like a good idea, we can use the
gym-version
flag to determine if we need to wrap the environment in a gym wrapper to shrink the size of the returned tuple from 5 arguments to 4.
Okay sure, I'll send in a PR with the gym-version
flag.
As for the
TimeLimit.truncated
part, that is a bit of old gym magic -- by supplying themax_episode_steps
argument when adding an environment to the gym register, the backend wraps the environment in aTimeLimitWrapper
which then ends the episode and setsTimeLimit.truncated: True
in theinfo
as part of the return of thestep
function. For example,means the episode will always end after 200 steps. I don't think we need to do anything to change that
and what is the basis for the choice of 200
for max_episode_steps
, and similarly other values for the other environments?
In general, we want to give the agents the ability to successfully end an episode while exploring, so it the max_episode_steps
needs to be large enough such that this is possible. However, we don't want to allow the episodes to go on forever, as this may fill the replay buffer with many useless experiences. This is not a value I spent a lot of time tuning
Hey Kevin, I hope you are doing well. I noticed a small bug where the step function returns only
obs, reward, done, info
instead of theobs, reward, terminated, truncated, info
. I came across this article from gymansium that emphasised the need for both terminated and truncated. Can I help in updating the codebase?