kurtzace / diary-2024

0 stars 0 forks source link

AWS deep racer #14

Open kurtzace opened 5 days ago

kurtzace commented 5 days ago


Car provided image

3d racing simulator

Deep racer uses Reinforcement image

Agent - car

action taken by agent - reward with +ve or no or -ve reward

episode - start to end - or drives off the track

rewards image

exploration (may go off track)

exploitation (safer track boundary adherence)

speed, sterring angle - parameters

console has 15 to 20 tracks

reward functions


input params


heading (angle from x axis)

all wheels on track - true (could be start reward)

distance from center (0 to 1)

default params - image

vehicle performs action - move from a to b - state is updated.


action space image

discrete - tabular - but no fine tuning - but training time will converge faster

continuous action space - give freedom - training time is high

setup racer profile image

example track: A to Z Speedway

clock wise is track direction

PPO - algo (2 NN)

Other algo is SAC

1 to 2 hours - model convergence

lap time should be minimal with car not leaving track

15 training hours per team

clone good models

at least 1 type should be in the track

kurtzace commented 3 days ago

Reinforcement learning algorithms are trained by repeated optimization of cumulative rewards. The model will learn which action (and then subsequent actions) will result in the highest cumulative reward on the way to the goal. Learning doesn’t just happen on the first go; it takes some iteration. First, the agent needs to explore and see where it can get the highest rewards, before it can exploit that knowledge.

Exploitation and Convergence With more experience, the agent gets better and eventually is able to reach the destination reliably. Depending on the exploration-exploitation strategy, the vehicle may still have a small probability of taking random actions to explore the environment.


wiki The parameters passed to the reward function describe various aspects of the state of the vehicle, such as its position and orientation on the track, its observed speed, steering angle and more. We will explore some of these parameters and how they describe the vehicle as it drives around the track:

x and y The position of the vehicle on the track
heading Orientation of the vehicle on the track
waypoints List of waypoint coordinates
closest_waypoints Index of the two closest waypoints to the vehicle
progress Percentage of track completed
steps Number of steps completed
track_width Width of the track
distance_from_center Distance from track center line
is_left_of_center Whether the vehicle is to the left of the center line
all_wheels_on_track Is the vehicle completely within the track boundary?
speed Observed speed of the vehicle
steering_angle Steering angle of the front wheels Range: -30:30 The negative sign (-) means steering to the right and the positive (+) sign means steering to the left.

more parameters

Type: Boolean

Range: (True:False)

A Boolean flag to indicate whether the agent has off track (True) or not (False) as a termination status.

Type: Boolean

Range: [True:False]

A Boolean flag to indicate if the agent is driving on clock-wise (True) or counter clock-wise (False).

It's used when you enable direction change for each episode.


Type: float

Range: -180:+180

Heading direction, in degrees, of the agent with respect to the x-axis of the coordinate system.


In this example, we give a high reward for when the car stays on the track, and penalize if the car deviates from the track boundaries. This example uses the all_wheels_on_track, distance_from_center and track_width parameters to determine whether the car is on the track, and give a high reward if so. Since this function doesn't reward any specific kind of behavior besides staying on the track, an agent trained with this function may take a longer time to converge to any particular behavior.

def reward_function(params):
    Example of rewarding the agent to stay inside the two borders of the track

    # Read input parameters
    all_wheels_on_track = params['all_wheels_on_track']
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']

    # Give a very low reward by default
    reward = 1e-3

    # Give a high reward if no wheels go off the track and
    # the agent is somewhere in between the track borders
    if all_wheels_on_track and (0.5*track_width - distance_from_center) >= 0.05:
        reward = 1.0

    # Always return a float value
    return float(reward)

. Follow Center Line In this example we measure how far away the car is from the center of the track, and give a higher reward if the car is close to the center line. This example uses the track_width and distance_from_center parameters, and returns a decreasing reward the further the car is from the center of the track. This example is more specific about what kind of driving behavior to reward, so an agent trained with this function is likely to learn to follow the track very well. However, it is unlikely to learn any other behavior such as accelerating or braking for corners.

def reward_function(params):
    Example of rewarding the agent to follow center line

    # Read input parameters
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Calculate 3 markers that are at varying distances away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
       reward = 1e-3  # likely crashed/ close to off track

    return float(reward)
  1. Prevent zig-zag This example incentivizes the agent to follow the center line but penalizes with lower reward if it steers too much, which will help prevent zig-zag behavior. The agent will learn to drive smoothly in the simulator and likely display the same behavior when deployed in the physical vehicle.
def reward_function(params):
    Example of penalize steering, which helps mitigate zig-zag behaviors
    # Read input parameters
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    abs_steering = abs(params['steering_angle']) # Only need the absolute steering angle
    # Calculate 3 marks that are farther and father away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width
    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
        reward = 1e-3  # likely crashed/ close to off track
    # Steering penality threshold, change the number based on your action space setting
    # Penalize reward if the car is steering too much
    if abs_steering > ABS_STEERING_THRESHOLD:
        reward *= 0.8
    return float(reward)

how to be fast tip


More ref


kurtzace commented 3 days ago

A to Z Speedway It’s easier for an agent to navigate this extra wide version of re:Invent 2018. Use it to get started with object avoidance and head-to-head race training.

Length: 16.64 m (54.59') Width: 107 cm (42")

Direction: Clockwise, Counterclockwise

kurtzace commented 3 days ago

when in anti clockwise

heading - 125 image

heading 178 image

-77 on way down

kurtzace commented 2 days ago

Random thoughts on What could an ideal reward function be?

kurtzace commented 2 days ago

Think in terms of percentages image

kurtzace commented 2 days ago

clockwise way points


import matplotlib.pyplot as plt
import numpy as np
tracksPath = '~/Downloads/reInvent2019_wide_cw.npy'
# Track name
track_name = "A to Z Speedway"

# Location of tracks folder
absolute_path = "."

# Get waypoints from numpy file

waypoints = np.load(tracksPath)

# Get number of waypoints
print("Number of waypoints = " + str(waypoints.shape[0]))

# Plot waypoints
for i, point in enumerate(waypoints):
    waypoint = (point[2], point[3])
    plt.scatter(waypoint[0], waypoint[1])
    plt.text(waypoint[0], waypoint[1], str(i), fontsize=9, ha='right')
    print("Waypoint " + str(i) + ": " + str(waypoint))

# Display the plot
plt.xlabel('X Coordinate')
plt.ylabel('Y Coordinate')
plt.title(f'Waypoints for {track_name}')
kurtzace commented 2 days ago

Simple reward


image image

kurtzace commented 2 days ago

clockwise way point

better waypoints for clockwise

Evaluation with limits of 1.5 to 3 speed

image image

kurtzace commented 2 days ago

percentage reward function

60% to 74% - speed of 1.5

40% to 59% - speed of 3

25% to 39% - speed of 1.5

10% to 24% - speed of 3

0% to 9% - speed of 1.5

60% to 74% - turn right

40% to 59% - follow center line (reward fuction above)

25% to 39% - turn right

10% to 24% - speed of 3 follow center line (reward fuction above), but also mild turning to left

0% to 9% - speed of 1.5 - turn right

60% to 74% - speed of 1.5 - could be right from center line by 50%

35% to 40% - speed of 3 - could be right from center line by 50%

40% to 60% - speed of 3 - could be left from center line by 50%

25% to 39% - speed of 1.5 - could be right from center line by 50%

10% to 24% - speed of 3 - should be exect on center line

40% to 59% - heading could be -55 degree

10% to 24% - heading could be 103 degree

25% to 39% - heading could be 0 degree


image image

kurtzace commented 2 days ago


image image