uzh-rpg / agile_autonomy

Repository Containing the Code associated with the Paper: "Learning High-Speed Flight in the Wild"
GNU General Public License v3.0
608 stars 165 forks source link

Results don't match the paper #79

Closed random-user-in-space closed 8 months ago

random-user-in-space commented 1 year ago

I tried replicating the project from the given repo and wanted to validate the results given in the original Paper "Learning High-speed flight in the wild". A. I followed all the steps and used the test script with default parameters, while varying only test_time_velocity and maneuver_velocity parameters, as instructed in the repo. B. I used the default weights provided in the repo as well. i.e. this one: https://github.com/uzh-rpg/agile_autonomy/blob/main/planner_learning/models/ckpt-50.index

The results I am getting are deviating from the claims of the paper to a very large extent. Can you please suggest if I am doing something wrong, or something that can be changed to achieve the claimed result.

Machine info - NVIDIA GeForce RTX 3070, 8GB

Graph attached: _     experimental result and paper claim @antonilo @kelia @thehighestmath @den250400

antonilo commented 1 year ago

Dear @random-user-in-space,

Thanks for using our repo. If you have a closer look at the readme, you will see that we report the following: " Edit the test_time_velocity and maneuver_velocity to the required speed. Note that the ckpt we provide will work for all speeds in the range [1,10] m/s. However, to reach the best performance at a specific speed, please consider finetuning the ckpt at the desired speed (see code below). " This means that if you want to get the same performance, you need to finetune a policy to one speed using the instruction in the paper. However, your numbers appear to be strange to me. If you visualize the policy, does it crash so often or does it just report a crash without actually going into an obstacle? There might be some communication problems between unity and the flight stack causing this. A good unit test is to overfit to the environment and train on a single speed. You should be quickly get 100% success rate there. Could you try to run this test and let me know if that works?

random-user-in-space commented 1 year ago

@antonilo Thank you for your response. These are my concerns/questions. I'd be thankful if you could address these concerns. A. "This means that if you want to get the same performance, you need to finetune a policy to one speed using the instruction in the paper." - While I understand that finetuning is required, there is already one checkpoint provided in the codebase. What configuration does that work on? Can you provide details of the environment and the configuration on which the given checkpoint works and mimics the results shown in the paper? It would be great if I can replicate performance you achieved on that checkpoint as a starting point.

B. I presume that you must have also trained for specific environment, as per the results claimed in the paper. Could you share the checkpoints for the other trainings as well? More concretely, this will help me verify the claims of the paper.

C. "However, your numbers appear to be strange to me. If you visualize the policy, does it crash so often or does it just report a crash without actually going into an obstacle? There might be some communication problems between unity and the flight stack causing this." - I verified it for all the 50 test runs, the drone actually crashes into the tree when the logger also agrees.

D. "A good unit test is to overfit to the environment and train on a single speed. You should be quickly get 100% success rate there. Could you try to run this test and let me know if that works?" - I am working on this, I will repost the results and claims once this is done.

Thank you for your time and consideration.

antonilo commented 1 year ago

Dear @random-user-in-space,

A. That checkpoint was trained at all speeds, with a bias towards data at 7m/s. As a result, it should get an okaish performance more or less at all speeds. With the checkpoint provided, you should get a 60/70% success rate at 7m/s (the success rate will vary between runs and seeds). Note that this is inline with what other users found. Also, other users reported an improvement of approximately 10% when finetuning the provided checkpoint, so bringing performance up to 70/80%.

B. For getting 90% success rate at 7m/s, we trained on a huge set of environments specifically at that speed. This includes both the forest and the randomly generated environments (which are described in the paper). Getting that small improvement gain required some small changes to the codebase, together with quite a large dataset. I would be happy to share that code, feel free to send me an email. Overall, note that this repository was optimized for usability and make it easy for people to build on top of our research. We are very happy that people are using it in this fashion: already several great contributions came out of it, even showing some successes and failures we were not aware of at publishing time (for example https://arxiv.org/pdf/2301.07430.pdf found that our approach is quite good in finding optimal paths, even though it was not trained for it. However, it performs quite bad when the density of obstacle increases). In a similar spirit, I hope that this repository will help your research.

antonilo commented 8 months ago

Sorry for the very late answer to this, but I've been very busy last year with other stuff and could not put my hands back on this code for a while. I used this spring break to try to reproduce the results and got the following. For simplicity, all next experiments are at 7m/s.

I took the pretraining data in https://zenodo.org/records/5517791 and trained a policy with it. The only change I did was not passing a reference trajectory (i.e., putting goal_dir=0 at training and testing time). The rationale is that when the goal is only to fly straight, there is no need to pass a reference direction. Getting the reference away speeds up pretraining and finetuning. This made the experiment a bit faster to do :) This policy's success was 7/10.

Then, I finetuned this policy at 7m/s on 1K rollouts from the forest with spacing 5. The best policy achieved a success rate of 9/10, and the second best of 8/10. You can find the logs of the evaluations attached to this comment.

Overall, it took me a while to get the code running. Unfortunately it was written more than 4 years ago so most of the dependencies are deprecated and I had to do suffer a lot to get it working. I unfortunately won't have the time to update all dependencies, I'm sorry! It is a bit easier to work with this repo (https://github.com/uzh-rpg/agile_flight) if you're interested in the general structure of the problem and less about the specifics of this work.

Finetune First Finetune Second Pretrained