Serious problems in RL dqn drone example

imnotkind commented 2 years ago

Bug report

AirSim Version/#commit: 1.6.0
UE/Unity version: UE bundled with 1.6.0 AirsimNH.zip
autopilot version: ??
OS Version: Windows 10

What's the issue you encountered?

There are several problems with the RL dqn drone tutorial in airsim.

IT DOESN'T WORK. The tutorial in https://github.com/microsoft/AirSim/blob/master/PythonClient/reinforcement_learning/airgym/envs/drone_env.py#L86-L92 tries to make the drone pass through some checkpoints as a path. But in reality, it barely passes through the first checkpoint, and never reaches the second one. You can print the position with self.state["position"]. It's not just because I was impatient, I trained it like 5 days. The reward setting seems like the problem for me. If we only see the minimum cross product of each checkpoints, then it would have optimal reward with just going on a straight line from the first checkpoint, not to mention there would be multiple optimal policies. For example in the below pic, the drone could fly through the outer green line, where it would have optimal reward since np.cross(quad_pt-pts[0], quad_pt-pts[1]) would be constantly 0.
Drone occasionally gets stuck. When it gets stuck in ground, collision flag does not raise, so the reward=-100 in code doesn't work, and the episode does not terminate. Or it could get stuck in a tree. We got rid of it by checking if the position have changed, but seriously, have you guys runned this thing?
Statistics are wrong. ep_len_mean, ep_rew_mean that prints every 4 episodes is faulty. I never checked tensorboard, but I'm quite sure that will be faulty too if this is. I think this is some problem of airgym or stable baselines, not sure which. I checked that the transitions are correctly getting in the replay buffer, but the episode info in ep_info_buffer used for evaluation in stable_baselines is kinda awkward. It keeps saving previous reward information over and over.

Settings

defaults settings in aisim 1.6.0, with the environment of AirsimNH.zip downloaded in github releases

How can the issue be reproduced?

pip install stable-baselines==1.3.0 and install all the rest needed (msgpack rpc, airsim lib, ...)
go to code, then python PythonClient/reinforcement_learning/dqn_drone.py

Include full error message in text form

What's better than filing an issue? Filing a pull request :). We are willing to, but we are surprised that this faulty RL drone example was existing for 11 months. Can anyone give us a WORKING EXAMPLE IN DRONE RL WITH AIRSIM? Is airgym just thrown to the trashbag?? We're not sure if we can safely use airgym for our drone RL research.

jonyMarino commented 2 years ago

Hi and welcome @imnotkind! Can you check out #4032 and see if it solves your issue?

imnotkind commented 2 years ago

Hi @jonyMarino . I'm afraid it doesn't solve my issue. As far as I know, the approach in #4032 only can speed up training, but the problems in this examples seems a lot more fundamental than just "slow training." Besides, I don't feel #4032 is anyway better than the original example. Does it really speed up training?

imnotkind commented 2 years ago

@jonyMarino It has been 10 days. Honestly, is this project abandoned or something? This slow response to a basic "RL airsim tutorial problem" suggests me that no one seems to have cared about it at all! For nearly a year!

Is it safe to use airgym? Is it safe to use RL in airsim using the apis in the given tutorial? Really, has no one dared to even check if the drone follows the given path?

jonyMarino commented 2 years ago

@imnotkind Sorry for the late response. Not at all abandoned. The DQN example has undergone some modifications over time. The last thing that was done was to use the airgym. There have also been changes in the environments, so this may have generated issues in the examples. The point of these scripts is to provide the users a better starting point that tells them how to write reinforcement learning scripts, how to specify rewards and tasks etc. It is open to user contributions to improve it. For example, you could do some trial and error on different reward functions, task setups, etc. Of course, we would like to have the examples working well, but we cannot work on that soon. Regardless of the examples, airgym should work, and keep us posted if you find any issues with it.

denmonz commented 2 years ago

@imnotkind Would your issue regarding evaluating rewards be resolved by adding more points to the pts array, such as midpoints (and midpoints of those midpoints) and shortening the thresh_dist so it further constrains the bounds of the agent? I'm also looking into adapting this algorithm to determine which line segment the agent is closest to, and evaluate the distance of the agent from the nearest point on that line segment in addition to the the distance of the actual points themselves.

microsoft / AirSim