Some attempts to apply Crazysim in the direction of MARL

XiAoSSuper commented 5 months ago

Hello,

My name is Steven, and I wanted to extend my gratitude for making the CrazySim project open-source. I have successfully utilized it on Ubuntu 22.04, and I am impressed with the results.

I am considering further developing an environment interface tailored for Multi-Agent Reinforcement Learning (MARL) algorithms, leveraging your work as a foundation. To this end, I have a few inquiries that I believe would benefit from your insights:

Simulation Stepping Efficiency: I have observed that CrazySim achieves optimal performance when the simulation clock closely matches the real-time clock at a 1:1 ratio. Could you share your thoughts on methods to potentially accelerate the simulation process while maintaining accuracy?

Parallel Simulation Execution: In line with the first question, we aim to enhance training efficiency by executing multiple simulation environments concurrently or by running several scenarios synchronously within the same simulation environment. This approach appears to be quite demanding on computational resources. I have noticed that the Ruby interpreter consumes more than 10 CPU cores, which is nearly the total count of my CPU's cores (12 in theory). Is there a possibility to overcome this limitation?

Your guidance on these matters would be greatly appreciated. I would like to reiterate my admiration for your work and the impact it has had on my project.

Thank you for considering my questions.

Best regards, Steven issue

llanesc commented 5 months ago

Hi @XiAoSSuper! Thanks for the comments and I'm glad that you have had success with CrazySim. CrazySim works best when the real-time clock is 1:1 because the firmware has its own clock on FreeRTOS that runs independent of Gazebo. Therefore, when you have IMU data coming in from Gazebo you would want the expected frequency to be 1000Hz which is the same as the hardware. There's a few other timing dependent modules in the firmware such as the estimator, stabilizer, etc. This is the main issue for why we want the real-time factor on Gazebo to be as close to 100%. This is a similar issue in other software-in-the-loop implementations. In order to run the physics at faster or slower than real time we need to update the clock in the firmware real-time operating system (RTOS) or the scheduler. I've been trying to figure out how to implement this functionality by looking into other SITL implementations that have solved this already such as PX4 lockstepping. The functionality is mostly aimed for slower computers being able to run CrazySim at slower than real-time or an automated faster than real-time simulation on GitHub backend of new firmware code before merging to main.

I never intended to create a gym for CrazySim because there's a lot of overhead from the firmware and communication with Gazebo which will slow down training tremendously. In my experience with RL on Crazyflies I typically use a different gym environment for training quickly in a parallelizable environment (basic quadrotor dynamics or gym-pybullet-drones) and transfer my model to a CFLib/Crazyswarm2 control code to test on CrazySim to see how the actual firmware will handle my hardware test code before deploying to real Crazyflies. I hope I was able to answer your question. Let me know if you have any more.

XiAoSSuper commented 5 months ago

I greatly appreciate your prompt response. It looks like CrazySim could be utilized for simulation verification prior to physical validation. Before that, the algorithm can be trained within an environment such as gym-pybullet-drones that provides a gym interface. Thank you once again for your valuable suggestion. I will certainly proceed to explore this approach.

gtfactslab / CrazySim

Some attempts to apply Crazysim in the direction of MARL #13