nicklashansen / tdmpc2

Code for "TD-MPC2: Scalable, Robust World Models for Continuous Control"
https://www.tdmpc2.com
MIT License
269 stars 48 forks source link

Inquiry: Suitability of tdmpc2 for Autonomous Drone Racing #25

Open ErinTUDelft opened 3 months ago

ErinTUDelft commented 3 months ago

Dear Nicklas,

Thank you for this great library! I am currently working on my thesis about reinforcement learning for autonomous drone racing and was originally considering using Dreamerv3, but I now think that tdmpc2 is more suitable.

The observations are the position, velocity, and orientation of the quadcopter, the action space is the rpm's of the 4 rotors, and the goal is to fly through various gates as quickly as possible. Later on I will also incorporate visual input, but for now the ground-truth state information of the drone will be fed to the algorithm.

I thus wanted to ask whether you think this would be a suitable use-case for tdmpc2? Lastly, I was wondering if any work has been done on implementing this into the Nvidia Isaac Sim/Gym simulation environment?

Kind regards, Erin

nicklashansen commented 3 months ago

Hi @ErinTUDelft, thanks for reaching out! This sounds like a great use case for TD-MPC2.

We don't have direct support for Nvidia Isaac Sim/Gym yet, but there's a few ongoing efforts to vectorize the algorithm. Branch episodic-rl adds support for episodic RL (episodes with variable length), branch vectorized_env adds support for vectorized environments in MuJoCo (CPU) with fixed episode length, and this commit experimentally adds support for vectorized environments in dflex with variable episode length.

I'd be interested in eventually adding support for Isaac as well but we don't have that at the moment. Using the dflex implementation as a starting point might be the easiest way forward for you. I hope this helps!

As an aside: if you are planning to deploy this on a real drone eventually, inference speed may or may not be a concern. I'd advise you to keep planning enabled during deployment if compute/latency permits, but otherwise keep in mind that disabling planning and simply using the model-free policy learned with TD-MPC2 will give you comparable inference speed to model-free algorithms like SAC/PPO potentially at the cost of some performance (might be tolerable for your problem).

ErinTUDelft commented 2 months ago

Hey Nicklas!

TDMPC actually already seems to have been implemented for the Isaac Sim within the Omnidrones library (https://github.com/btx0424/OmniDrones/blob/76731f94e96e494fb8e609919ba6e5419335d311/omni_drones/learning/tdmpc.py#L127), so that saves a whole lot of vectorization trouble :)

I created a fork to upgrade the current implementation in Omnidrones to TDMPC2 (for now without the multitask support as that makes it more complex), I don't know yet exactly how long it will take, but I hope to be able to finish it soon! (https://github.com/ErinTUDelft/tdmpc2-OmniDrones)

As per inference, this is indeed something I was curious about: I saw that in your original paper this took 20ms on a Nvidia 3090, and 12ms when reducing the planning horizon to 1. Have these durations remained approximately similar for tdmpc2? For robotics the Jetson series is most common and I now have access to the TX2 for prototyping, but likely for the race we will purchase a few Jetson Orin if necessary. Do you maybe know if any tests have been performed on the memory and inference performance of these systems?


From: Nicklas Hansen @.***> Sent: Tuesday, April 2, 2024 7:16:59 PM To: nicklashansen/tdmpc2 Cc: Erin Lucassen; Mention Subject: Re: [nicklashansen/tdmpc2] Inquiry: Suitability of tdmpc2 for Autonomous Drone Racing (Issue #25)

Hi @ErinTUDelfthttps://github.com/ErinTUDelft, thanks for reaching out! This sounds like a great use case for TD-MPC2.

We don't have direct support for Nvidia Isaac Sim/Gym yet, but there's a few ongoing efforts to vectorize the algorithm. Branch episodic-rlhttps://github.com/nicklashansen/tdmpc2/tree/episodic-rl adds support for episodic RL (episodes with variable length), branch vectorized_envhttps://github.com/nicklashansen/tdmpc2/tree/vectorized_env adds support for vectorized environments in MuJoCo (CPU) with fixed episode length, and this commithttps://github.com/nicklashansen/tdmpc2/commit/777b2c98500413435320b602ba420e5b78956883#diff-b3dab4feb53ffc9b2f225d9fbf1c166312fb294d303c41d62094c48363257c17 experimentally adds support for vectorized environments in dflex with variable episode length.

I'd be interested in eventually adding support for Isaac as well but we don't have that at the moment. Using the dflex implementation as a starting point might be the easiest way forward for you. I hope this helps!

As an aside: if you are planning to deploy this on a real drone eventually, inference speed may or may not be a concern. I'd advise you to keep planning enabled during deployment if compute/latency permits, but otherwise keep in mind that disabling planning and simply using the model-free policy learned with TD-MPC2 will give you comparable inference speed to model-free algorithms like SAC/PPO potentially at the cost of some performance (might be tolerable for your problem).

— Reply to this email directly, view it on GitHubhttps://github.com/nicklashansen/tdmpc2/issues/25#issuecomment-2032613671, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWNCL6P43P6JAJRSJIRLZMDY3LRYXAVCNFSM6AAAAABFTU7CI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZSGYYTGNRXGE. You are receiving this because you were mentioned.Message ID: @.***>

nicklashansen commented 2 months ago

@ErinTUDelft Oh that's great, thanks for sharing! We have not run inference on any of the systems that you mention, but inference speed should be comparable to TD-MPC1 :-) For reference, I believe we have been able to run inference at up to ~50 Hz on a workstation with a 4090 GPU.

ErinTUDelft commented 2 months ago

Hey @nicklashansen, interesting that you haven't yet used it for inference on the Jetson series which is the predominant gpu for small robots; I mean I don't see many drones supporting a 4090 anytime soon ;) For drones especially fast inner loop control is important (~300hz) but if TDMPC is usable one layer less deep it could still be very worthwhile.

The problem the author of Omnidrones found was that the planning causes the model to train much slower than other algorithms such as PPO which we talked about here (https://github.com/btx0424/OmniDrones/issues/67). I verified this statement and ran into the same problems, but thought that it could also maybe just be due to an improper implementation. That is why I'm very much looking forward to seeing your Isaac Sim TDMPC version; do you have an indication on when that would be finished?

nicklashansen commented 2 months ago

No ETA but I think it will be a while still. But yes I don't think you can expect off-policy algorithms in general to vectorize as well as PPO, regardless of whether you use planning or not. SAC vectorization is usually in the order of ~10 environments vs. ~16k for PPO.