Using Reinforcement Learning in CrazyFlyt

RibhavOjha commented 8 months ago

Hi

I wanted to implement the quadx_hover_env of Pyflyt on real hardware using CrazyFlyt.

I was looking at this file: https://github.com/jjshoots/CrazyFlyt/blob/master/examples/sim_n_fly_single.py

It seems like we cannot put the PyFlyt's action space (bodyrates x, y, z and thrust) here. Am I missing something?

Thanks

jjshoots commented 8 months ago

Yep, PyFlyt's action space doesn't translate 1-to-1 with CrazyFlyt, mainly because the Crazyflie library uses a different control scheme, where bodyrate control isn't implemented natively within the crazyflie-lib-python. Instead, you will have to train the reinforcement learning algorithm within an environment that uses a high level of control, either mode 7 or mode 6 with a swapping of the last two values. To change the flight mode within the environment, the relevant line of code is here. I've been meaning to make this a configurable parameter for awhile now, but haven't gotten around to it yet - PRs are very welcome.

On CrazyFlyt's end, the control commands are documented here. Unfortunately, CrazyFlyt is one of those libraries that I would love to improve more, but unfortunately I have pretty much lost access to CrazyFlie drones, so I am very open to someone else taking over here should they be interested.

RibhavOjha commented 8 months ago

Oh great! Thanks. I am planning to use a motion capture system with Optitrack, because we need to feed an observation space to the RL. Any guidance would be appreciated!

Thanks

jjshoots commented 8 months ago

If you are comfortable with cloning and installing from source, then I would do that and modify hover env to use mode 6/7, train an agent in there, then adapt the agent to crazyflyt using a simple command remapping. Otherwise, I can update PyFlyt to allow custom flight modes in the quadx envs.

RibhavOjha commented 8 months ago

I was planning to change the env mode to train the agent using mode 6/7, save the model, and deploy that on crazyflie. What I am not sure is how to use that saved model (model.zip, for example) in real hardware flight?

jjshoots commented 8 months ago

What RL framework are you planning to train the agent with?

jjshoots commented 8 months ago

CrazyFlyt doesn't run code natively on the crazyflyt itself - it doesn't have enough compute capacity for many interesting applications. Instead, it streams the UAV state from the drone itself and then streams setpoints to the drone via the CrazyRadio PA. Therefore, any thing that you can run directly within simulation in CrazyFlyt should be transferrable to the actual UAV.

RibhavOjha commented 8 months ago

So this is how fly_single.py looks like:

    UAV.start()
    # send the drone to hover 0.5 meters
    UAV.set_setpoint(np.array([0.0, 0.0, 0.5, 0.0]))
    UAV.sleep(5
)

And this is how we do it in RL:

while not (dones or truncation):
 action, _states = model.predict(obs, deterministic=True)
 obs, rewards, dones, truncation, info = test_env.step(action)

My question is, how do we get this "obs" for RL to work? In fly_single.py, we just give commands to go to a certain position, I cannot see where and how we are getting the drone state information. Just for information, I am using flowdeck for localisation.

jjshoots commented 8 months ago

You will be able to get the current UAV estimate through drone.position_estimate https://github.com/jjshoots/CrazyFlyt/blob/master/CrazyFlyt/drone_controller.py#L135

jjshoots commented 8 months ago

Note that the PyFlyt env contains much more detailed and un-noised information that what CrazyFlyt provides. If you want to use more precise information via optitrack, it'll likely have to be through the backend that the system provides.

RibhavOjha commented 8 months ago

Thanks a lot!. I will try to use drone.position_estimate with FlowDeck and see if it works. Because Flowdeck gives local coordinates, do I need to do anything else? (I know for a fact that if i am using Optitrack, I don't need to do anything more and just feed the observation to the RL)

jjshoots commented 8 months ago

That should theoretically be all you need, outside of maybe some level of filtering for the state. Let me know how it goes!

RibhavOjha commented 8 months ago

Just another quick question, in CrazyFlyt, the observation space is: [x, y, z, yaw]. In the given Pyflyt env, the obs space has a size of 17 (angular velocities, angular positions, linear velocity, linear position, previous action).

To deploy my RL, do I have to change the obs to [x, y, z, yaw] in PyFlyt as well? Or is there any other way?

Thanks for your help!

jjshoots commented 8 months ago

RL generally expects the observation space to not change. You'll probably need to change the obs to [x, y, z, yaw] unless you plan on modifying things on CrazyFlyt's end to have a more comprehensive observation space.

jjshoots commented 8 months ago

FYI, I've just updated PyFlyt to v0.17.0, which has a flight_mode argument for all the environments now. If you'd like, you should use that so you won't have to modify the source and can use PyFlyt natively with the transform observation wrapper from Gymnasium.

jjshoots / CrazyFlyt

Using Reinforcement Learning in CrazyFlyt #2