Closed pender closed 7 years ago
@pender - Sorry for the slow reply, I need to change my settings so I receive notifications.
I'll look at your solution today. Even within the MuJoCo tasks there was inconsistency with how observations were returned. I think I even ran into mixed types within an observation. As you can see, I had to do lots of brute force casting.
I think your solution was good, but I decided to use np.squeeze to remove the extra dimension. I'll push this into the master branch. I'm going to keep the aigym_evaluation branch frozen where it was when I ran all 10 MuJoCo environments. (Although the fix doesn't seem to cause a problem with the MuJoCo environments, they were just forgiving of having an extra dimension).
I'm glad you posted, I'm looking forward to trying the roboschool environments. I'm curious how the simulation speed compares.
OpenAI just posted a short PPO paper and they use a different loss function. I'll probably give that a try soon.
Hi! I love your repo (and your blog, and your suggestions for a ML intro curriculum of MOOCs) -- thank you!
Submitting this as an issue rather than a PR because I'm not sure if I fixed the issue in the best way.
I am having an issue trying to run
train.py
on a roboschool environment. I added "import roboschool" to the top oftrain.py
(which registers the Roboschool environments) and had the following result:I used some debug statements to determine that line 105 of
train.py
is callingenv.step(action)
when the value of action is[[-0.70904064 -0.71731383]]
-- i.e. a list of shape [1, 2] rather than a one-dimensional list of length 2. The action space for the environment isBox(2,)
so I think it should just be a list of two floats.I tried changing line 105 to
obs, reward, done, _ = env.step(action[0])
to eliminate the degenerate dimension and it seems to work at that point.I'm on Ubuntu 16.04.2 LTS, TF v1.2.1, gym v0.9.1, fresh install of roboschool as of 5 minutes ago.