utiasDSL / gym-pybullet-drones

PyBullet Gymnasium environments for single and multi-agent reinforcement learning of quadcopter control
https://utiasDSL.github.io/gym-pybullet-drones/
MIT License
1.18k stars 349 forks source link

test_multiagent.py error for more than 2 drone #83

Closed atharrashno closed 2 years ago

atharrashno commented 2 years ago

Hi, Jacopo!, please don't kill me for naive question:| I am vary new in RL and quad controlling but i want to understanding this by comparing 3 multi-agent environments :

leaderfolower flock meetup

I find out thesis are different in computing reward but i cannot see test_multiagent.py result for NUM_DRONE >=3 because of :

action = {0: temp[0][0], 1: temp[1][0]}

why do you limit action for 2 drone(o and 1 th drone)?

34

JacopoPan commented 2 years ago

Hi @atharrashno,

If you look at multiagent.py (where the models/policies used in test_multiagent.py are tested) you'll see that the training scheme (central critic and multiple policies) is hardcoded to the 2-agent scenario

https://github.com/utiasDSL/gym-pybullet-drones/blob/7f8d7167bf0046a94259aaaf4fba7163b3ae5563/experiments/learning/multiagent.py#L124

https://github.com/utiasDSL/gym-pybullet-drones/blob/7f8d7167bf0046a94259aaaf4fba7163b3ae5563/experiments/learning/multiagent.py#L272

Note that this is based on the centralized_critic_2.py example in rllib

atharrashno commented 2 years ago

Hi @atharrashno,

If you look at multiagent.py (where the models/policies used in test_multiagent.py are tested) you'll see that the training scheme (central critic and multiple policies) is hardcoded to the 2-agent scenario

https://github.com/utiasDSL/gym-pybullet-drones/blob/7f8d7167bf0046a94259aaaf4fba7163b3ae5563/experiments/learning/multiagent.py#L124

https://github.com/utiasDSL/gym-pybullet-drones/blob/7f8d7167bf0046a94259aaaf4fba7163b3ae5563/experiments/learning/multiagent.py#L272

Note that this is based on the centralized_critic_2.py example in rllib

tanks a lot jacop

atharrashno commented 2 years ago

Hi Jacopo! Thank you for responding, however because I've gone from image processing to this field, I'd want to request that you offer me with a roadmap to help me understand the things you mention in your answer.

JacopoPan commented 2 years ago

Hello @atharrashno,

If you want to have a high-level overview of the idea of multi-agent RL with separate policies and a centralized critic, I'd suggest Lowe's MADDPG paper.

The training scheme in multiagent.py follows a similar structure but you'll note that the number of policies and the number of observations fused into the central observation of the critics is hard-coded to 2.

That is just an example that you can draw inspiration for but you shouldn't be constrained by: the main purpose of this repo is to provide environments on which to test any RL training scheme you want to think of.

atharrashno commented 2 years ago

Hello Jacop! I altered the hard code from two to three drones after reading the OpenAI paper, but the result was really poor. Now I'd want to implement the concept of piloting four drones in two separate groups, as the performance of the two drones is fairly optimal. Do you believe this concept to be reasonable? Where do you believe this implementation should commence?

JacopoPan commented 2 years ago

I'd start to think of what the reward function is (and the state-value/q-value functions it would lead to).

atharrashno commented 2 years ago

I am familiar with the reward function (in 3 Maneuver: leaderfollower,meetup,flock) and the state value function (get Box of 20 values in _getDroneStateVector), but I am unable to locate the q-value function in the gym-pybullet-drone functions.Do you mean action function?

JacopoPan commented 2 years ago

Reward, actions, and states exist on the side of the "environment"/MDP (i.e. within gym-pybullet-drones, policies (or actors) and value functions (or critics) are models that you want to learn on the agent's side (see the RL textbook).

atharrashno commented 2 years ago

Thank you I understand the ideas, but to put them into practice, I have to find the code for each model in gym-pybullet-drone and make the changes that are needed.

JacopoPan commented 2 years ago

gym-pybullet-drones classes are environments; policies and value functions reside on the agent side, if you look at the scripts in examples/learning, there are agents based on stable-baselines3 and ray[rllib] (but you could implement your own) and you can access/modify the networks used to approximate each function there.

BCWang93 commented 1 year ago

@atharrashno ,how did you change the code from 2 drones to 3 drones?Thanks