utiasDSL / gym-pybullet-drones

PyBullet Gymnasium environments for single and multi-agent reinforcement learning of quadcopter control
https://utiasDSL.github.io/gym-pybullet-drones/
MIT License
1.21k stars 351 forks source link

High frequency in RPMs when include action buffer in observation space can couse problems in real hardware #212

Open piratax007 opened 4 months ago

piratax007 commented 4 months ago

Hello @JacopoPan,

First of all, congratulations on this wonderful repo.

Now, I'm training a policy to control a real Crazyflie, I'm using RPM as action space instead of ONE_DIM_RPM and I have success in simulation except for the RPM plot.

As you can see here:

rpms

the RMP has this high frequency that makes it useless to implement in a real drone.

I understand what you said in #180 "The main thing to note is that the observation contains the actions of the last .5 seconds, so increasing the ctrl freq will increase the obs space." and "The idea of the action buffer is that the policy might be better guided by knowing what the controller had done just before, the proportionality to the control frequency makes it dependent on the wall-clock only, and not the type of controller (but it might be appropriate to change that, depending on application).". Nevertheless, adding the action buffer in the observation space has as a consequence the high frequency shown before. If I remove the buffer and use only the states (12 inputs) as observation space, the drone achieves the target position and orientation (because I'm controlling yaw) and the RPM doesn't present the high frequency reported

xyz rpy rpms

Questions:

  1. What is the difference between transferring the trained policy to a real drone, with the action buffer in the observation space and without it?
  2. I'm trying to add a low pass filter to reduce the high frequency in the RPMs, can you help me to deduce what is the best cut-off and sample frequency to set up the filter?
  3. In the SB3 documentation that you refer to, I cannot find anything about using this action buffer in the observation space and I have some questions about it like, how to determine the size of the buffer. As you said, the buffer's size is related to the CTRL_FREQUENCY, but why? What means CTRL_FREQUENCY? and what is the relation between CTRL_FREQUENCY and PYB_FREQUENCY and time-step? (I know that in BaseAviary.py line 481 you define the time step using PYB_FREQUENCY).
  4. What is the frequency in which the policy interacts with the drone (send actions and receive observations and rewards)?

Thanks for your time.

JacopoPan commented 4 months ago

Hi @piratax007, apologies for the late answer.

piratax007 commented 4 months ago

Thanks for your time and answer.

I extracted the action buffer from the observation space and training using curriculum learning the training process takes a couple of hours for a complicated task.

Now I'm trying sim2real using crazyflie 2.x and Vicon system, I have issues because the observations from Vicon come with noise and the policy was trained in a perfect environment. Any suggestions for the sim2real transfer part?

Best regards.