This UAV Logistics Environment with a continuous observation and discrete action space, along with physical based UAVs and parcels which powered by Unity Engine. Used in Paper "Multiagent Reinforcement Learning Based on Fusion-Multiactor-Attention-Critic for Multiple-Unmanned-Aerial-Vehicle Navigation Control"(MDPI Energies 2022, 15(19), 7426 (SCIE), 2022.10.10.) and "Multi-agent Reinforcement Learning-Based UAS Control for Logistics EnvironmentsMulti-agent Reinforcement Learning-Based UAS Control for Logistics Environments"(Springer LNEE, volume 913 (SCOPUS). 2022.09.30.)
The Unity MLAgents, PyTorch, and CUDA versions in this LogisticsEnv are very old and incompatible with modern GPUs and OS, so I am in the process of upgrading dependencies and this environment.
I am also in the process of transitioning to the Issac Sim environment.
(2024. 3. 11.)
model_path = '~~/<trained_model_name>.pt' # write model path
model = AttentionSAC.init_from_save(model_path) # load model data from saved file
baselines
, version 0.1.6pytorch
, < 1.9.0, >= 1.6.0 (compatible to your CUDA version)tensorboard
(compatible to PyTorch version)gym
, version 0.15.7mlagents
, version 0.27.0 (Release 18)
torch 1.8.2+cu111
/ torchaudio 0.8.2
/ torchvision 0.9.2+cu111
Unity Editor
git clone https://github.com/dmslab-konkuk/LogisticsEnv.git
cd MAAC
or cd MADDPG
Build_Windows
or Build_Linux
(give right path)sudo chmod a+xwr /Build_Linux/Logistics.x86_64
python main.py
to run trainingmodel.pt
on replay.py
and python replay.py
to replayTensorboard
MAAC/models/Logistics/MAAC/
or MADDPG/models/Logistics/MADDPG
tensorboard --logdir=runX
localhost:6006
Parcel Counter
MAAC/CSV/countXXXX.csv
: number of successfully shipped parcel is written in this csv file. (XXXX must be yyyyMMddHHmmss of training start time)Timer
MAAC/CSV/timerXXXX.csv
: spent time to finish shipping given boxes. (finishing condigion follows max_smallbox
and max_bigbox
parameters) Gym Functions
This Logistics Environment follows OpenAI Gym API design :
from UnityGymWrapper5 import GymEnv
- import class (newest version is Wrapper5)env = GymEnv(name="path to Unity Environment", ...)
- Returns wrapped environment object.obs = reset()
- Resets environment to the initial state. Returns initial observation.obs, reward, done, info = step(actions)
- A single step. Require actions, returns observation, reward, done, information list.example
from UnityGymWrapper5 import GymEnv # Unity Gym Style Wrapper
env = GymEnv(name="../Build_Linux/Logistics") # Call Logistics Environment
done, obs = False, env.reset() # reset Environment
while not done:
actions = get_actions(obs) # get actions
next_obs, reward, done, info = env.step(actions) # next step
obs = next_obs
Unity Gym Wrapper This Wrapper can wrap Unity ML-Agents Environment (API version 2.1.0 exp1, mlagents version 0.27.0) which has multiple Discrete-Action-Agent.
GymWrapper provided by Unity supports only single agent environment.
UnityGymWrapper5.py is in Github Repository.
Parameter Configurations
env = GymEnv(name='', width=0, height=0, ...)
width
: Defines the width of the display. (Must be set alongside height)height
: Defines the height of the display. (Must be set alongside width)timescale
: Defines the multiplier for the deltatime in the simulation. If set to a higher value, time will pass faster in the simulation but the physics may perform unpredictably.quality_level
: Defines the quality level of the simulation.target_frame_rate
: Instructs simulation to try to render at a specified frame rate.capture_frame_rate
: Instructs the simulation to consider time between updates to always be constant, regardless of the actual frame rate.name
: path to Unity Built Environment (ex : ../Build_Linux/Logistics
)mapsize
: size of map in virtual environment (x by x)numbuilding
: number of buildings (obstacle)max_smallbox
: max number of small box will be generatedmax_bigbox
: max number of big box will be generatedObservation size for each agent
29 + 7 x (nagent - 1) + (27 : ray-cast obs)
This UAV Information
Raycast Observation (from Unity ML-Agents)
UAV can move to 6 directions (up, down, forward, backward, left, right) or not move
The action is discrete action, and size of action set is 7.
Driving Reward
(pre distance - current distance) * 0.5
To make UAV learn driving forward destination, distance penalty is given per every step. If UAV holds any parcel, the distance is calculated with a destination where the parcel have to shipped. If UAV have to pick some parcel, distance between UAV and a big box or a small box, whichever is closer to UAV is calculated.
Shipping Reward
These values are designed to make UAV work efficiently.
Collision Penalty
UAV has to avoid buildings and another UAV with raycast observation.
We trained model with random-decision model, reinforcement model (SAC, DQN, MADDPG) and MAAC (Multi-Attention-Actor-Critic for Multi-Agent) model. We trained 30k episode each model.
developed by Hoeun Lee (in DMS Lab in Dept. of Computer Science and Engineering, Konkuk University, Seoul, Korea)
Copyright Hoeun Lee, 2021, All Right Reserved.