mrwangyou / DBRL

A Gym Dogfighting Simulation Benchmark for Reinforcement Learning Research
80 stars 13 forks source link
air-combat reinforcement-learning

DBRL: A Gym Dogfighting Simulation Benchmark for Reinforcement Learning Research

[Reference Manual](doc/QuickStart_en.md) | [简体中文使用手册](doc/QuickStart_zh.md)

Introduction

DBRL is an air combat simulation benchmark based on JSBSim and Dogfight 2.

We adopt JSBSim as flight dynamic model to build our dogfight environment, and Dogfight 2 to simulate the situation when the plane is confronted with a missile.

If you are new to air combat simulation or reinforcement learning algorithm, we advise you to learn DBRL tutorial first.

Gym Parameters

# JSBSim
observation_space = Box(
    low=np.array([
        -360,  # Latitude
        -360,  # Longitude
        0,     # Height above sea level
        -360,  # Yaw
        -360,  # Pitch
        -360,  # Roll
    ] * 2),
    high=np.array([
        360,
        360,
        60000,
        360,
        360,
        360,
    ] * 2)
)

action_space = Box(
    low=np.array([
        -1,  # Aileron
        -1,  # Elevator
        -1,  # Rudder
        0,   # Throttle
        0,   # Flap
        0,   # Speed brake
        0,   # Spoiler
    ]),
    high=np.array([
        1,
        1,
        1,
        1,
        1,
        1,
        1,
    ]),
)

# Dogfight 2
if ego_plane_position:
    observation_infimum = np.append(observation_infimum, [-300, -300, -1])
    observation_supermum = np.append(observation_supermum, [300, 300, 200])
if ego_plane_attitude:
    observation_infimum = np.append(observation_infimum, [0, -360, -360])
    observation_supermum = np.append(observation_supermum, [360, 360, 360])
if oppo_plane_position:
    observation_infimum = np.append(observation_infimum, [-300, -300, -1])
    observation_supermum = np.append(observation_supermum, [300, 300, 200])
if oppo_plane_attitude:
    observation_infimum = np.append(observation_infimum, [0, -360, -360])
    observation_supermum = np.append(observation_supermum, [360, 360, 360])
if missile_position:
    observation_infimum = np.append(observation_infimum, [-300, -300, -1])
    observation_supermum = np.append(observation_supermum, [300, 300, 200])
if missile_attitude:
    observation_infimum = np.append(observation_infimum, [-315, -315, -315])
    observation_supermum = np.append(observation_supermum, [315, 315, 315])
if missile_relative_azimuth:
    observation_infimum = np.append(observation_infimum, [-1, -1, -1])
    observation_supermum = np.append(observation_supermum, [1, 1, 1])
observation_space = Box(
    low=observation_infimum,
    high=observation_supermum,
)

if pitch_enable:
    action_infimum = np.append(action_infimum, -1)
    action_supermum = np.append(action_supermum, 1)
if roll_enable:
    action_infimum = np.append(action_infimum, -1)
    action_supermum = np.append(action_supermum, 1)
if yaw_enable:
    action_infimum = np.append(action_infimum, -1)
    action_supermum = np.append(action_supermum, 1)
if flaps_enable:
    action_infimum = np.append(action_infimum, 0)
    action_supermum = np.append(action_supermum, 1)
if throttle_enable:
    action_infimum = np.append(action_infimum, 0)
    action_supermum = np.append(action_supermum, 1)
if flare_enable:
    action_infimum = np.append(action_infimum, 0)
    action_supermum = np.append(action_supermum, 1)
action_space = Box(
    low=action_infimum,
    high=action_supermum,
)

Welcome PR

If you find any mistakes while using, or have any suggestions and advices, please point it out in Github Issues. We're also looking forward to your contributions, such as your air combat reinforcement learning models, to this dogfight benchmark.