wil3 / gymfc

A universal flight control tuning framework
http://wfk.io/neuroflight/
MIT License
389 stars 99 forks source link

Build a docker image with gazebo and gymfc and run a demo #26

Closed MichaelManz closed 4 years ago

MichaelManz commented 5 years ago

Compile mesa drivers to allow direct rendering from inside the docker image. Successfully tested on MacOS with Intel and Radeon graphics hardware but should also work on other hardware/os

This can be used as starting point to build a docker image for headless training

ssdasgupta commented 5 years ago

When I am running docker run -ti -e DISPLAY=127.0.0.1:0 gymfc:latest, I am getting this from gazebo:

[Msg] Connected to gazebo master @ http://127.0.0.1:11926
[Msg] Publicized address: 172.17.0.2
[Err] [RenderEngine.cc:728] Can't open display: 127.0.0.1:0
[Wrn] [RenderEngine.cc:93] Unable to create X window. Rendering will be disabled
[Wrn] [RenderEngine.cc:293] Cannot initialize render engine since render path type is NONE. Ignore this warning ifrendering has been turned off on purpose.
[Dbg] [QuadcopterWorldPlugin.cpp:101] Binding on port 9865
[Dbg] [QuadcopterWorldPlugin.cpp:558] Quadcopter controller online detected.

Simulation Stats
-----------------
steps                  1001
packets_dropped        0
time_start_seconds     1559218739.7992752
time_lapse_hours       0.001133014957110087

This I believe is going as expected and I do get the 'desired' and 'actual' values to plot. However, the moment matplotlib is trying to access the backend I am getting this error

Killing process with ID= 21
Traceback (most recent call last):
  File "run_iris_pid.py", line 452, in <module>
    main(args.env_id, args.seed)
  File "run_iris_pid.py", line 436, in main
    plot_step_response(np.array(desireds), np.array(actuals), title=title)
  File "run_iris_pid.py", line 66, in plot_step_response
    f, ax = plt.subplots(num_subplots, sharex=True, sharey=False)
  File "/usr/local/lib/python3.6/dist-packages/matplotlib/pyplot.py", line 1203, in subplots
    fig = figure(**fig_kw)
  File "/usr/local/lib/python3.6/dist-packages/matplotlib/pyplot.py", line 539, in figure
    **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/matplotlib/backend_bases.py", line 3252, in new_figure_manager
    return cls.new_figure_manager_given_figure(num, fig)
  File "/usr/local/lib/python3.6/dist-packages/matplotlib/backends/_backend_tk.py", line 946, in new_figure_manager_given_figure
    window = tk.Tk(className="matplotlib")
  File "/usr/lib/python3.6/tkinter/__init__.py", line 2023, in __init__
    self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: couldn't connect to display "127.0.0.1:0"

Is there way to fix this? I am running on OSx

MichaelManz commented 5 years ago

Hey @ssdasgupta ,

thanks for trying this out!

wil3 commented 5 years ago

Just a suggestion I wouldn't spend much time on the current master branch, GymFCv0.2 is in the dev-px4-motor-model branch which is a complete architectural change. Our team has been using it for the past month and it is stable enough to use. Still working on updating and writing docs but will probably be merging into master soon.

MichaelManz commented 5 years ago

I will rebase/rebuild this change on the new baseline

MichaelManz commented 5 years ago

@wil3 The Dockerfile now works with the new architecture, please review. I am not so sure if the naming and structure fits into the new GymFC world, any hints/feedback very welcome.

@ssdasgupta FYI, you can check if that works for yo

wil3 commented 5 years ago

Hey @MichaelManz this is looking pretty sweet.

@abhimanyu-jain do you think you could test this out after you finish the Singularity container? This may help you with that as well.

Couple comments,

Thanks again for the contribution!

MichaelManz commented 5 years ago

Hey @MichaelManz this is looking pretty sweet.

@abhimanyu-jain do you think you could test this out after you finish the Singularity container? This may help you with that as well.

Couple comments,

  • Looks like your pulling DART from apt-get, have you not had any problems with this? We had to end up compiling Dart from source (we were having issues with the versions provided the distro repo) using v6.7.0.

I followed the gazebo compile documentation with DART support (http://gazebosim.org/tutorials?tut=install_from_source&cat=install) and they say DART only needs to be compiled together for special needs. How do I make sure DART works? I can at least see uncritical DART warnings on the console when starting the solo model. So the DART code is at least used I guess.

  • There hasn't been any development or use of the solo model so this will need some attention if this is to be used as a demo. I'm guessing you have it at least popping up using the test script?

Yep, the solo models shows up. What kind of attention do you mean? I do use sed to add a plugin configuration to model, I would prefer to check this in with repo directly if that makes sense. What model do you use for training? I can also include that one.

  • The problem I see is doing actual training or controller development using Docker. You'd have to include this all into the Docker image correct? I guess the question is what the intent of the Docker image will be. Just run a test? Actually do R&D? This could provide a baseline for others to fork and include what they need if its meant to just serve as a simple "hello world" type demo.

For short term I want to use this to be able to train a model on my MacBook and maybe to reproduce your research results on another drone.

But yes, this can also help to provide an easy to use and consistent development infrastructure to get new contributors on board. R&D and training would then be done within the container. A lot of advantages come with Docker. If you guys would use this for your R&D on Ubuntu we would have a single point of code for the infrastructure and it would make sure that GymFC works (most likely) the same on every machine. And it is easy to deploy the image to a cloud provider to increase compute power, e.g. an AWS p3.8xlarge can speed up training with 4 GPUs a lot (maybe you want to increase the size of the neural network in the future).

But I can understand that it is also sometimes annoying to have to build the docker image on every change. However, there are some tricks to mitigate this and in practice this works quite well.

Docker images do build on each other, so the full blown version would probably have a base image with Gazebo/DART for headless training, on top there would be another image with the mesos drivers and maybe another one for the demo image. Forking would create duplicated code which is harder to maintain.

Thanks again for the contribution!

You're welcome, thanks for researching on that domain!

wil3 commented 5 years ago

How do I make sure DART works?

I believe we were getting a bunch of actual faults and errors so it sounds like its working, but you won't really know until you run the test_axis.py script. This script is interactive, will that play nice with Docker? A benefit of compiling DART from source is we can control the environment. When we were having install problems I found they were updating their apt-get repo often like multiple times a day. One day we pulled down a version it worked, next day it didnt...

I would prefer to check this in with repo directly if that makes sense. I can go ahead and do that, I need to calculate the center of thrust unless you already have then go for it. I've been developing a new model for my thesis that is for a custom racing drone that will likely remain closed source. In the future it would be valuable to have an accurate model for an off the shelf drone. The solo is probably fine but its a bad model to train for precision attitude control because its a clunky, unbalanced photography drone, so the user would need to take that into consideration developing their environment.

If you guys would use this for your R&D on Ubuntu we would have a single point of code for the infrastructure and it would make sure that GymFC works (most likely) the same on every machine.

I definitely think its valuable to provide the user with options for deployment such as Docker however our computing cluster at BU only supports Singularity so we dont have a use case for it at the moment.

But I can understand that it is also sometimes annoying to have to build the docker image on every change. However, there are some tricks to mitigate this and in practice this works quite well.

It may be easier to understand with a clearer picture of the new architecture. I'm working on finishing a preprint of the paper I can share with you so you can see how it might impact development with Docker.

wil3 commented 5 years ago

@MichaelManz I think it may be better to have a different script as the demo. Start sim doesnt send any control signals so you can't verify all the motor plugins are working. What about step sim and pass the value 1 1 1 1 at the command line for 100% thrust, then at least you'll see motors spin if everything is working.

MichaelManz commented 5 years ago

@wil3 New update: I pinned the version to gazebo 10.1 and Dart 6.7.0 as discussed.

It was necessary to compile Dart from source for these versions as you said. I moved the solo model update to PR https://github.com/wil3/gymfc-digitaltwin-solo/pull/1. Until this PR is merged the Dockerfile will clone from the PR's branch.

The demo's entry point is changed to step sim as suggested. The model's rotors are spinning if https://github.com/wil3/gymfc-aircraft-plugins/blob/master/src/gazebo_motor_model.cpp#L301 is uncommented. I could raise a PR this as well...

wil3 commented 5 years ago

Awesome, thanks! So about the aircraft-plugin repo...that is a port from PX4 SITL plugins however from some testing I found their default motor response model (using a first order filter) to be inaccurate. There is a discussion about it here, https://github.com/PX4/sitl_gazebo/issues/110#issuecomment-497716875.

The motors would not spin because the PID parameters in the SDF have not been added. By default it falls back to the first order filter which you enabled remove the comment. For this repo (since I dont know the proper motor response) I need to add back in the code to fall back to the filter if the PID isn't present.

wil3 commented 4 years ago

Hey @MichaelManz sorry I dropped the ball on this. I'd like to get this merged, since the solo model has not been validated would you be able to make this container more generic? I'd like to have this container decoupled from the aircraft model. Then it can be used, not only for a demo but also for testing and development.

As a first pass being able to mount a volume where the aircraft model exists and then doing a docker run with any of the test scripts would be awesome. Let me know if you have any bandwidth to work on this and I'll finish up the review of the PR.

MichaelManz commented 4 years ago

Does not sound too difficult, I'll try to find time the next days 👌

MichaelManz commented 4 years ago

I made the change as discussed. I left the plugin compile because they need to compiled with ubuntu in order to work. I also changed the entry point so that any script can be called within the docker file.

wil3 commented 4 years ago

Just got done building the container, script seems to boot up ok however I'm unable to get the GUI to come up. I am running this on Ubuntu I'm guessing you've only tested it on osx? We may have to scope this PR to just an osx example and when I have time I can play with it more and possibly restructure the container as I discussed before to make this work with linux. Thank you again for your contribution and patience getting this merged.

Edit: I re-read and saw you are testing this in linux, is there something else that is needed for the GUI to work?

MichaelManz commented 4 years ago

@wil3 this should work with Ubuntu without problems. I did not test it though. Are you running a XServer on your host? and did you pass DISPLAY with your ip? And did you configure XServer to accept network connections outside from localhost? These are the most common pitfalls with this...

MichaelManz commented 4 years ago

Btw: the documentation says that it is tested only with MacOS.

wil3 commented 4 years ago

@all-contributors add MichaelManz for code, example, and infra

allcontributors[bot] commented 4 years ago

@wil3

I've put up a pull request to add @MichaelManz! :tada: