theilem / uavSim

BSD 3-Clause "New" or "Revised" License
90 stars 22 forks source link

Old Version

The previous repository to recreate the ICAR results is found in the icar branch.

Table of contents

Introduction

This repository contains the implementation of the power-constrained coverage path planning (CPP) with recharge problem and the proposed PPO-based deep reinforcement learning (DRL) solution. The DRL approach utilizes map-based observations, preprocessed as global and local maps, action masking to ensure safety, discount factor scheduling to optimize the long-horizon problem, and position history observations to avoid state loops.

Screenshot of the evaluation

The agents are stored in a submodule and can be pulled by

git submodule init
git submodule pull

For questions, please contact Mirco Theile via email mirco.theile@tum.de.

Requirements

tensorflow~=2.11.0
opencv-python==4.7.0.68
scikit-image==0.21.0
gymnasium==0.27.0
pygame==2.5.1
tqdm~=4.64.1
seaborn==0.12.2
dataclasses-json==0.5.7
einops~=0.6.1

Developed and tested only on Linux and MacOS.

How to use

With this repository PPO agents can be trained to solve the power-constrained CPP problem with recharge. Additionally, newly trained and example agents can be evaluated with a visualization.

Training

General usage:

python train.py [-h] [--gpu] [--gpu_id GPU_ID] [--generate] [--verbose] [--params [PARAMS ...]] config

positional arguments:
  config                Path to config file

options:
  -h, --help            show this help message and exit
  --gpu                 Activates usage of GPU
  --gpu_id GPU_ID       Activates usage of GPU on specific GPU id
  --generate            Generate config file for parameter class
  --verbose             Prints the network summary at the start
  --params [PARAMS ...]
                        Override parameters as: path/to/param1 value1 path/to/param2 value2 ...

How to recreate all the agents used in the paper:

Normal Agents:

Mask Ablation:

Discount Scheduling Ablation:

Position History Ablation:

Evaluating

General Usage

python evaluate.py [-h] [-a [A ...]] [-t [T ...]] [-d] [-r [R ...]] [--scenario SCENARIO] [--all_maps] [--heuristic] [--maps_only] [--gpu] [--gpu_id GPU_ID] [--generate] [--verbose] [--params [PARAMS ...]] config

positional arguments:
  config                Path to config file

options:
  -h, --help            show this help message and exit
  -a [A ...]            Add maps
  -t [T ...]            Add timeouts for maps, 1000 otherwise
  -d                    remove all other maps
  -r [R ...]            Record episode only, potentially override render params
  --scenario SCENARIO   Load specific scenario
  --all_maps            Load all maps
  --heuristic           Use Heuristic Only
  --maps_only           Draws maps only
  --gpu                 Activates usage of GPU
  --gpu_id GPU_ID       Activates usage of GPU on specific GPU id
  --generate            Generate config file for parameter class
  --verbose             Prints the network summary at the start
  --params [PARAMS ...]
                        Override parameters as: path/to/param1 value1 path/to/param2 value2 ...

For instructions in the interactive evaluation environment press the h key.

Recreate scenarios in the paper:

To record the videos and log the final trajectory and statistics add -r. It will run in the background.

Figure 2:

Figure 7:

Figure 8:

Figure 9:

Figure 10:

Figure 11:

Figure 12:

Resources

The maps from the paper are included in the 'res' directory. Map information is formatted as PNG files with one pixel representing one grid-world cell. The pixel color determines the type of cell according to

If you would like to create a new map, you can use any tool to draw a PNG with the same pixel dimensions as the desired map and the above color codes.

When maps are loaded for the first time, a model is computed that is later used by the FoV calculation, action mask, and heuristic. The model is saved as 'res/[map_name]_model.pickle'. For large maps, this process may take a few minutes.

Reference

If using this code for research purposes, please cite:

@misc{theile2023learning,
      title={Learning to Recharge: UAV Coverage Path Planning through Deep Reinforcement Learning}, 
      author={Mirco Theile and Harald Bayerlein and Marco Caccamo and Alberto L. Sangiovanni-Vincentelli},
      year={2023},
      eprint={2309.03157},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}

License

This code is under a BSD license.