wil3 / gymfc

A universal flight control tuning framework
http://wfk.io/neuroflight/
MIT License
389 stars 99 forks source link

iPython #91

Closed jpark0315 closed 3 years ago

jpark0315 commented 3 years ago

Hey, great work! Thanks for releasing this.

I was just wondering if there is any way to test your custom environment(from examples) in iPython? The script seemingly just ends when I run it on the terminal's ipython, and it doesn't run on jupyter either. I'm not sure how else I would be able to play around with the environment, other than just reading printed outputs from the terminal.

Also, do you plan on supporting mac os? Or am I missing some instructions that are already available..?

Thanks!

jpark0315 commented 3 years ago

Also, I'm planning to do a project based on your work. That is, adaptive PID tuning with RL for drone control.

I was able to get learning agents on toy control environments in open ai's gym, and now am trying to scale it up and solve more complex environments such as ones from your GymFc.

Eventually, I am planning to transfer the learned model from the simulator to a real drone.

I was planning to go through your thesis, it looks very informative. Are there any extra resources that you think would be useful? Thank you!

wil3 commented 3 years ago

Thanks! I've never tested in ipython, mainly because RL training takes hours, if not days. Someone contributed a docker file to use with mac os, instructions are here but I've never verified so results may very. My suggestion would be to just use a virtual box ubuntu image. I use one for testing new features.

Using GymFC to find optimal PID gains is great idea and practical. I'd have a look first at this recent issue https://github.com/wil3/gymfc/issues/90 which asks about optimal PID gains. If you eventually want to do a transfer to the real drone its important to think about which firmware that may be.

My suggestion would be to start simple. Use a simple reward function to start, for example the tracking error and then (ideally) define the response characteristics (such as damping ratio, steady state error etc) and integrate these into the reward function.

However I feel like genetic algorithms or something similar would be better suited for finding optimal PID gains when the gains are constant. RL is beneficial for continuous control tasks. It was years ago now but I did experiment with an adaptive PID controller in which the gains were set by a NN trained using RL. The gains fluctuated too much and it didn't work out. It ends up just being simpler to do neural control at that point since you already have the NN. Finding optimal static PID gains would be cool because you wouldn't need unique firmware, you can just drop in the gains you find and use them.

jpark0315 commented 3 years ago

Thanks for the reply! However, correct me if I'm wrong but I don't believe genetic algorithms work for adaptive real time tuning? Like you said, RL is a better fit for continuous control problems, and I want to extend the usage of PID controllers for dynamically changing environments.

I'm running the 'pid tuned by RL' experiments but unfortunately am not seeing good results yet. Would you say this is an impossible task? It certainly is a challenging one haha.

Do you think I need to redefine my MDP? I'm currently using the default ones from your examples, except that I'm concatenating states for multiple timesteps and actions are real-time pid gains. I just assumed your default setting would be the best fit for my problem as it works best for the neuroflight.

Sorry if I'm being a nuisance, I just really would like to make this work!

wil3 commented 3 years ago

GA for real-time tuning no, for static gains yes.

The greatest benefit you are going to get from using different PID gains and doing adaptive control is if your environment is changing such that the gains should change to obtain better performance. This was not the objective for the gymf_nf environment, the environment and aircraft are fixed through out the training episode.

Most people want auto-tuners because its a pain to tune for different aircraft. One possible area to explore could be to change the quadcopter configuration (CG, motor properties, frame change, weight, etc) for each training episode. In any case you need to identify something that will change in the environment.

The default reward would probably still be ok from gym_nf, but to simplify things I'd just start with a reward as the error penalty and then iterate from there.