High frequency in RPMs when include action buffer in observation space can couse problems in real hardware

piratax007 commented 4 months ago

Hello @JacopoPan,

First of all, congratulations on this wonderful repo.

Now, I'm training a policy to control a real Crazyflie, I'm using RPM as action space instead of ONE_DIM_RPM and I have success in simulation except for the RPM plot.

As you can see here:

rpms

the RMP has this high frequency that makes it useless to implement in a real drone.

I understand what you said in #180 "The main thing to note is that the observation contains the actions of the last .5 seconds, so increasing the ctrl freq will increase the obs space." and "The idea of the action buffer is that the policy might be better guided by knowing what the controller had done just before, the proportionality to the control frequency makes it dependent on the wall-clock only, and not the type of controller (but it might be appropriate to change that, depending on application).". Nevertheless, adding the action buffer in the observation space has as a consequence the high frequency shown before. If I remove the buffer and use only the states (12 inputs) as observation space, the drone achieves the target position and orientation (because I'm controlling yaw) and the RPM doesn't present the high frequency reported

xyz rpy rpms

Questions:

What is the difference between transferring the trained policy to a real drone, with the action buffer in the observation space and without it?
I'm trying to add a low pass filter to reduce the high frequency in the RPMs, can you help me to deduce what is the best cut-off and sample frequency to set up the filter?
In the SB3 documentation that you refer to, I cannot find anything about using this action buffer in the observation space and I have some questions about it like, how to determine the size of the buffer. As you said, the buffer's size is related to the CTRL_FREQUENCY, but why? What means CTRL_FREQUENCY? and what is the relation between CTRL_FREQUENCY and PYB_FREQUENCY and time-step? (I know that in BaseAviary.py line 481 you define the time step using PYB_FREQUENCY).
What is the frequency in which the policy interacts with the drone (send actions and receive observations and rewards)?

Thanks for your time.

JacopoPan commented 4 months ago

Hi @piratax007, apologies for the late answer.

The buffer is something I introduced only in a second moment to make the training examples faster (and because it's a feature you see in other similar drone RL works), it's not something intended to facilitate sim2real transfer
I would start attempting sim2real with the smoother policy first, I'm a bit skeptic about deploying a very noisy controller + a filter in real hardware.
What I refer to as action buffer has nothing to do with SB3 per se: it is simply the concatenation of the actions commanded to the environment in the last N seconds to the observation the environment returns. Its size depends on CTRL_FREQUENCY because the faster the CTRL_FREQUENCY, the more actions will have been sent in the last N seconds.
CTRL_FREQUENCY is the frequenty at which the policy interacts with the drone, PYB_FREQUENCY is the frequency at which Bullet is called to update the state of the simulation (it must be a multiple of CTRL_FREQUENCY).

piratax007 commented 4 months ago

Thanks for your time and answer.

I extracted the action buffer from the observation space and training using curriculum learning the training process takes a couple of hours for a complicated task.

Now I'm trying sim2real using crazyflie 2.x and Vicon system, I have issues because the observations from Vicon come with noise and the policy was trained in a perfect environment. Any suggestions for the sim2real transfer part?

Best regards.

utiasDSL / gym-pybullet-drones

High frequency in RPMs when include action buffer in observation space can couse problems in real hardware #212