leehe228 / LogisticsEnv

UAV Logistics Environment for Multi-Agent Reinforcement Learning / Unity ML-Agents / Unity 3D
MIT License
81 stars 11 forks source link
deep-reinforcement-learning ml-agents reinforcement-learning reinforcement-learning-algorithms reinforcement-learning-environments robotics uam uav unity unity3d

UAV Logistics Environment for MLRL

This UAV Logistics Environment with a continuous observation and discrete action space, along with physical based UAVs and parcels which powered by Unity Engine. Used in Paper "Multiagent Reinforcement Learning Based on Fusion-Multiactor-Attention-Critic for Multiple-Unmanned-Aerial-Vehicle Navigation Control"(MDPI Energies 2022, 15(19), 7426 (SCIE), 2022.10.10.) and "Multi-agent Reinforcement Learning-Based UAS Control for Logistics EnvironmentsMulti-agent Reinforcement Learning-Based UAS Control for Logistics Environments"(Springer LNEE, volume 913 (SCOPUS). 2022.09.30.)

📢 Upgrading Environment and Transitioning to Issac Sim

The Unity MLAgents, PyTorch, and CUDA versions in this LogisticsEnv are very old and incompatible with modern GPUs and OS, so I am in the process of upgrading dependencies and this environment. I am also in the process of transitioning to the Issac Sim environment.

📌 LogisticsEnv Builds Release (1.0.0)

(2024. 3. 11.)

📌 Trained Model

image

Requirements

My Environments


Unity Editor


Getting Started

Tensorboard

Parcel Counter

Timer


Scenario


Used Algorithm


Python API

Gym Functions

This Logistics Environment follows OpenAI Gym API design :

example

from UnityGymWrapper5 import GymEnv # Unity Gym Style Wrapper
env = GymEnv(name="../Build_Linux/Logistics") # Call Logistics Environment
done, obs = False, env.reset() # reset Environment

while not done:
    actions = get_actions(obs) # get actions
    next_obs, reward, done, info = env.step(actions) # next step
    obs = next_obs


Unity Gym Wrapper This Wrapper can wrap Unity ML-Agents Environment (API version 2.1.0 exp1, mlagents version 0.27.0) which has multiple Discrete-Action-Agent.

GymWrapper provided by Unity supports only single agent environment. UnityGymWrapper5.py is in Github Repository.

Parameter Configurations env = GymEnv(name='', width=0, height=0, ...)


Observation

Observation size for each agent

29 + 7 x (nagent - 1) + (27 : ray-cast obs)

This UAV Information

Raycast Observation (from Unity ML-Agents)


Actions

UAV can move to 6 directions (up, down, forward, backward, left, right) or not move

The action is discrete action, and size of action set is 7.


Reward

Driving Reward

(pre distance - current distance) * 0.5

To make UAV learn driving forward destination, distance penalty is given per every step. If UAV holds any parcel, the distance is calculated with a destination where the parcel have to shipped. If UAV have to pick some parcel, distance between UAV and a big box or a small box, whichever is closer to UAV is calculated.

Shipping Reward

These values are designed to make UAV work efficiently.

Collision Penalty

UAV has to avoid buildings and another UAV with raycast observation.


Training Result

We trained model with random-decision model, reinforcement model (SAC, DQN, MADDPG) and MAAC (Multi-Attention-Actor-Critic for Multi-Agent) model. We trained 30k episode each model.


Credit

developed by Hoeun Lee (in DMS Lab in Dept. of Computer Science and Engineering, Konkuk University, Seoul, Korea)

Copyright Hoeun Lee, 2021, All Right Reserved.