Closed srussvoll closed 6 years ago
Hi @srussvoll, have your tested the compiled plugin with the example to verify its functioning correctly?
I have tested that this works as expected:
python -m gymfc.controllers.iris_pid_eval --env-id=AttFC_GyroErr-MotorVel_M4_Ep-v0
That's good news it was just a matter of recompiling. Now that I am thinking about it more, supporting multiple versions of Gazebo isn't as trivial as I thought it would be. Could you test if the compiled plugin for Gazebo 8 works in 9? If we are forced to maintain different binaries for different versions of Gazebo how GymFC is installed is going to have to change. The options I see are, 1) Have different versions of gymfc 2) User must compile the plugin 3) Write a script invoked during a pip install to compile the binary.
For 2 and 3 this would be a pain for others just trying to use the library, which leaves us with 1. Any chance you know how to achieve 1 in setup.py?
Regarding 2 and 3, I have seen many pip based projects that require something to be compiled. Regarding 1, you could run gzserver -v | grep "version 9"
to check for version 9. Exit code 0 indicates version 9. This could be implemented to decide which .so-file to use.
There seems to be hard coded gazebo environment variables set in gazebo_env.py. These should instead be sourced from /usr/share/gazebo/setup.sh to make this work. I have sourced it in my environment, so it works even though the hard coded paths in gazebo_env.py are for Gazebo 8.
I do not have Gazebo 8 anymore, so it is not possible for me to test if the new .so works with Gazebo 8.
I have seen many pip based projects that require something to be compiled.
I have seen this too however then the user needs to make sure they have all the Gazebo dependencies for compilation. I can't remember its been awhile, are all dependencies for compilation available after Gazebo is installed?
This could be implemented in init.py on runtime to decide which .so-file to use.
If you have all precompiled .so files the problem would be you'd have to go all the way up the chain and invoke the correct Gazebo world to load the corresponding .so file, which is in turn is linked to an OpenAI gym environment. To bypass all the Gazebo SDF stuff it could be possible at installation to move/copy the correct .so to a directory that will be loaded when Gazebo is started.
There seems to be hard coded gazebo environment variables set in gazebo_env.py.
Yes everything is hardcoded. From my understanding you can't invoke source using the subprocess module into the Python running environment. This still needs to be addressed.
I have seen this too however then the user needs to make sure they have all the Gazebo dependencies for compilation. I can't remember its been awhile, are all dependencies for compilation available after Gazebo is installed?
I completely removed Gazebo 8 and installed Gazebo 9 using the install script provided on Gazebos homepage. This installed all compile dependencies automatically.
Yes everything is hardcoded. From my understanding you can't invoke source using the subprocess module into the Python running environment. This still needs to be addressed.
In that case the hard-coded paths need to be updated for now.
In that case the hard-coded paths need to be updated for now.
The paths shouldn't be updated because we need to support all future versions of Gazebo. We either need to figure out if there is a way to some how do the source or the fallback would be to add the hardcoded paths for Gazebo 9 and then when gymfc is invoked it will just check which version is installed and set the environment variables accordingly.
Yes, there are a few more changes to be done in order to support both Gazebo 8 and Gazebo 9. This PR only addresses Gazebo 9, so you might want to reject this PR in that case.
Yea I can't merge it if its not backwards compatible.
If you want to make the changes let me know I'll reopen them.
This code can be used to source environment variables:
def shell_source(self, script):
pipe = subprocess.Popen(". %s; env" % script, stdout=subprocess.PIPE, shell=True)
output = pipe.communicate()[0]
env = dict((line.decode("utf-8").split("=", 1) for line in output.splitlines()))
os.environ.update(env)
self.shell_source("/usr/share/gazebo/setup.sh")
This is awesome thanks for looking into that. By the end of the month I plan to have all the GZ9 stuff integrated. BTW recently one of our students has been having problem getting GZ8 running on the newest LTS Ubuntu release 18.04. Did you run into anything like this? It seems its been removed from some of the distributions. which is a good reason to do this upgrade soon.
Yes, I use Ubuntu 18.04 on the development machine, and it did not have Gazebo 8 in the stock apt repositories. It does however have Gazebo 9. I did an install from source to get Gazebo 8. It required a lot of dependencies which needed to be compiled from source too. For instance the latest version of most of these libraries were in the repositories, but Gazebo 8 needed old versions of these libraries. So it takes a lot of time getting it installed on Ubuntu 18.04. gzserver worked, however gzclient did not. It segfaulted before any logs were written, and a trace showed that the problem was with a libQtGui.so file. After a quick search on the internet, I couldn't find any simple solutions to this problem, so I thought it'd be easier to make GymFC work with Gazebo 9 rather than make Gazebo 8 work on Ubuntu 18.04.
It required a lot of dependencies which needed to be compiled from source too.
Yea that's a huge pain we don't want that. I'm still on 16.04 so once I get my systems upgraded I'll be able to have a platform to test on which should be soon.
@srussvoll These changes are now integrated into master. Did you get it working on GZ9? I'm going to work on that next week.
This PR seemed to work with GZ9, yes. However as we talked about, training with baselines->ppo2 did not work out of the box. You sent me a code excerpt that you said should work with PPO1. When I tried running it, it crashed with errors from Tensorflow. I don't know whether it crashed due to the upgrade to GZ9, problems with the PPO1 code you provided, or a problem with Tensorflow.
I have an AMD GPU which does not run Tensorflow due to it being implemented with CUDA. So I have set up ROCm and a Tensorflow implementation for ROCm instead. So far it has worked well with all other Tensorflow code that I have tried running.
So it is a bit difficult to say what the problem is without looking quite a bit deeper into this. Is the PPO1 code you sent supposed to just work right out of the box?
Yea PPO1 worked as is (at least for GZ8). If the PID example does work then that is a good indication the upgrade to GZ9 is working so it much be something up with training. When I make the switch to GZ9 hopefully next week I'll be able to see if I can reproduce. Can you post the Tensorflow errors?
The PID example produced the expected results. I tried running the code you posted again, and it does not give any errors while training. It halts with errors when I try to --play
it. After the first iteration, the EpRewMean
is -51.9
, and after iteration 278 (timestep 571k) it is at -23.8
. Then at iteration 488 (timestep 1M) it is back at -50.8
.
I think it is important that GymFC
works out of the box with baselines
, simply because baselines
was created to provide a set of baseline implementations for the gym. It provides a simple way to compare your own algorithms to the baselines, and requires that the users can trust that the baselines integration with the gym actually works properly.
Currently the gym only utilizes one Gazebo server using only one core. With an i7 that means we could train 8 times faster (provided the GPU can keep up) if the Gazebo server was cloned to use all CPU cores. What do you think about this?
It halts with errors
What errors?
What do you think about this?
Anything to help speed up training would be awesome.
I don't have the output available right now. I think they were related to some file or similar not being found. I have simply run this code directly: https://gist.github.com/wil3/4115a31c527afd4a7f8ecfab88fa4a24 I can post the errors when I have the output available.
(cherry picked from commit fe587fe19c724163564745066d8c64291a297050)