An easy-to-use library to develop deep reinforcement learning experiments in the Isaac Sim simulator.
The library aims to conduct experiments and DRL training with mobile robots in a realistic environment using standard libraries such as OpenAi Gym, Pytorch, Isaac Sim extensions, and SB3 in a unified and ready-to-use framework. Furthermore, the library is easy to use, configure, and customize all different robots, sensors, environments, and methods that allow and facilitate research in AI-based mobile robots using Isaac Sim, speeding up time-consuming steps for training expert agents.
You must have a computer compatible with Isaac Sim 2021.2.1, please check the official documentation.
Open a terminal and run the following commands:
Three files to train a DQN agent are included to illustrate the usage of the library. These are:
The differential robot used is a jetbot. The scene is a random obstacle generator. The goal is to achieve an end position by avoiding different obstacles by utilizing a set of discrete actions.
class Isaac_envs(gym.Env):
metadata = {"render.modes": ["human"]}
## ...
from isaac_robots import isaac_robot
from isaac_envs import isaac_envs
from omni.isaac.core.objects import VisualCuboid
env_name = "random_walk"
robot_name = "jetbot"
action_type = "discrete"
## ...
If you want to change the robot, scene, or action type, modify the following parameters:
To fast obtain the observation of the environment, the methods created in this work are used:
def get_observations(self):
## Camera Data
# rgb_data = self.isaac_environments._get_cam_data(type="rgb") ## Custom method
depth_data = self.isaac_environments._get_cam_data(type="depth") ## Custom method
## Lidar Data
lidar_data = self.isaac_environments._get_lidar_data() ## Custom method
# for transporter uncomment the next line
# lidar_data2 = self.isaac_environments._get_lidar_data(lidar_selector=2) ## Custom method
## Distance and angular differencess
goal_world_position, _ = self.goal.get_world_pose()
d = self.robot.distance_to(goal_world_position) ## Custom method
angle = self.robot.angular_difference_to(goal_world_position) ## Custom method
target_relative_to_robot_data = np.array([ d, angle ])
## Robot base's velocities
real_V = self.robot.get_lineal_vel_base() ## Custom method
real_W = self.robot.get_angular_vel_base() ## Custom method
vase_vel_data = np.array([ real_V, real_W])
obs = {"IR_raleted" : lidar_data, "pos_raleted" : target_relative_to_robot_data, "vel_raleted" : vase_vel_data}
The reward function used is:
$$\begin{linenomath} \begin{align} \label{eqn:reward} r = \left{\begin{array}{lll} -d_{t} \left ( 1 - \frac{stepi}{step{max}} \right ) &; \text{if goal isn't achieved} \ -p &; \text{if robot collides with the obstacle}\ r_{l} \left ( 1 - \frac{stepi}{step{max}} \right ) &; \text{if robot achieves goal} \end{array}\right. \end{align} \end{linenomath}$$
After 3.000.000 steps, the results are:
An evaluation of 30 episodes was made to extract some useful information about the quality of the learned policy $\pi$, the result presented in the next table.
Parameter | Value |
---|---|
Rate of success | 86.7% |
Episode's time | 27.6 [s] |
Episode's steps | 1619 [steps] |
Robot trajectory | 504.1 [cm] |