Closed nbenave closed 2 years ago
Hi @nbenave,
when you run
python gym-pybullet-drones/examples/learn.py
what you see at the end is a trained model applied to the quadrotor, i.e. line 88:
https://github.com/utiasDSL/gym-pybullet-drones/blob/c62e67ab2dca8580e907ec45f95b1e24eba0bd0e/examples/learn.py#L88
the resulting performance is not great because learn.py
is an example script that learns over "only" 10000 steps
https://github.com/utiasDSL/gym-pybullet-drones/blob/c62e67ab2dca8580e907ec45f95b1e24eba0bd0e/examples/learn.py#L56
if you want to look at those 10000 steps, you only need to change this line
https://github.com/utiasDSL/gym-pybullet-drones/blob/c62e67ab2dca8580e907ec45f95b1e24eba0bd0e/examples/learn.py#L42
to
env = gym.make("takeoff-aviary-v0", gui=True)
however, I think you'll realize that adding the frontend and rendering can make the learning prohibitively time consuming
in singleagent.py
I used stable-baselines3
's EvalCallback
to save a model every time it improves performance
https://github.com/utiasDSL/gym-pybullet-drones/blob/c62e67ab2dca8580e907ec45f95b1e24eba0bd0e/experiments/learning/singleagent.py#L235
you might want to do something similar to visualize how the agent changes during learning "offline"
Thank you for your quick and detailed answer!
in the Show performance code section (line 72-101) the environment will display the model top performance ?
I have a few short questions if you can clarify few things
Thanks again.
Briefly
env.steps()
's, i.e. 10'000/(env.SIM_FREQ
/env.AGGR_PHY_STEPS
) secondsenv.AGGR_PHY_STEPS
) seconds of simulation (different environments have different episode length) Thank you again mate. now its more clear for me :)
Another question about the multi-agent learning. The training is for both of the quad-copter? each quadcopter is training separately ? Each of them observe simultaneously ? or there's a joint observation ?
The reward in each step related to the follower / the leader / or both of them ?
Thanks!
The MARL example in multiagent.py
is based on the centralized critic examples of RLlib so yes, both agents learn, there is some postprocessing that goes into creating the observations of each agent and each agent has its own reward signal.
The multi-agent script, in my intention, was meant as a demonstration of how a multi-agent environment can be used.
The best way to do MARL is still a bit up for debate, imho.
Thanks again, how the reward is calculated in multiagent learning ?
there's a reward for drone 0 , and reward for drone 1, but I dont understand how you calculate the overall reward.
There's an equation for calculating these two rewards into one overall reward ?
I'm using tensorboard and the mean-reward graph displays only one parameter, and not for two drones.
The multi-agent aviary returns a dictionary of rewards because each agent can receive its own signal. How to use these to learn multiple critics/value functions depends on the MARL approach you are implementing (see parameter sharing vs. fully independent learning vs. centralized critic, etc.). Off the top of my head, I don't remember what value you'd see on TB.
Hi there, Very impressive work !
when I run the learn.py I can see the learning process of the quadcopter attempts to fly, however, not all attempts are available , only few. is there anyway to see all attempts? I wish to preview the learning process visually.
Also - is it available on singleagent.py as well ?
Thanks,