AWS deep racer - Githubissues

kurtzace commented 5 days ago

video

Car provided

3d racing simulator

Deep racer uses Reinforcement

Agent - car

linked by state env - track

action taken by agent - reward with +ve or no or -ve reward

episode - start to end - or drives off the track

rewards

exploration (may go off track)

exploitation (safer track boundary adherence)

speed, sterring angle - parameters

console has 15 to 20 tracks

reward functions

input params

heading (angle from x axis)

all wheels on track - true (could be start reward)

distance from center (0 to 1)

default params -

vehicle performs action - move from a to b - state is updated.

action space

discrete - tabular - but no fine tuning - but training time will converge faster

continuous action space - give freedom - training time is high

setup racer profile

example track: A to Z Speedway

clock wise is track direction

PPO - algo (2 NN)

Other algo is SAC

1 to 2 hours - model convergence

lap time should be minimal with car not leaving track

15 training hours per team

clone good models

at least 1 type should be in the track

kurtzace commented 3 days ago

Reinforcement learning algorithms are trained by repeated optimization of cumulative rewards. The model will learn which action (and then subsequent actions) will result in the highest cumulative reward on the way to the goal. Learning doesn’t just happen on the first go; it takes some iteration. First, the agent needs to explore and see where it can get the highest rewards, before it can exploit that knowledge.

Exploitation and Convergence With more experience, the agent gets better and eventually is able to reach the destination reliably. Depending on the exploration-exploitation strategy, the vehicle may still have a small probability of taking random actions to explore the environment.

parameters

wiki The parameters passed to the reward function describe various aspects of the state of the vehicle, such as its position and orientation on the track, its observed speed, steering angle and more. We will explore some of these parameters and how they describe the vehicle as it drives around the track:

Position on track (The parameters x and y describe the position of the vehicle in meters, measured from the lower-left corner of the environment.)
Heading (The heading parameter describes the orientation of the vehicle in degrees, measured counter-clockwise from the X-axis of the coordinate system.)
Waypoints (The waypoints parameter is an ordered list of milestones placed along the track center. Each waypoint in waypoints is a pair [x, y] of coordinates in meters, measured in the same coordinate system as the car's position.)
Track width (The track_width parameter is the width of the track in meters.)
Distance from center line (The distance_from_center parameter measures the displacement of the vehicle from the center of the track. The is_left_of_center parameter is a boolean describing whether the vehicle is to the left of the center line of the track.)
All wheels on track
Speed (The speed parameter measures the observed speed of the vehicle, measured in meters per second.)
Steering angle (The steering_angle parameter measures the steering angle of the vehicle, measured in degrees. This value is negative if the vehicle is steering right, and positive if the vehicle is steering left.)

x and y	The position of the vehicle on the track
heading	Orientation of the vehicle on the track
waypoints	List of waypoint coordinates
closest_waypoints	Index of the two closest waypoints to the vehicle
progress	Percentage of track completed
steps	Number of steps completed
track_width	Width of the track
distance_from_center	Distance from track center line
is_left_of_center	Whether the vehicle is to the left of the center line
all_wheels_on_track	Is the vehicle completely within the track boundary?
speed	Observed speed of the vehicle
steering_angle	Steering angle of the front wheels Range: -30:30 The negative sign (-) means steering to the right and the positive (+) sign means steering to the left.

more parameters

is_offtrack

Type: Boolean

Range: (True:False)

A Boolean flag to indicate whether the agent has off track (True) or not (False) as a termination status.

is_reversed

Type: Boolean

Range: [True:False]

A Boolean flag to indicate if the agent is driving on clock-wise (True) or counter clock-wise (False).

It's used when you enable direction change for each episode.

Heading

Type: float

Range: -180:+180

Heading direction, in degrees, of the agent with respect to the x-axis of the coordinate system.

Example

In this example, we give a high reward for when the car stays on the track, and penalize if the car deviates from the track boundaries. This example uses the all_wheels_on_track, distance_from_center and track_width parameters to determine whether the car is on the track, and give a high reward if so. Since this function doesn't reward any specific kind of behavior besides staying on the track, an agent trained with this function may take a longer time to converge to any particular behavior.

def reward_function(params):
    '''
    Example of rewarding the agent to stay inside the two borders of the track
    '''

    # Read input parameters
    all_wheels_on_track = params['all_wheels_on_track']
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']

    # Give a very low reward by default
    reward = 1e-3

    # Give a high reward if no wheels go off the track and
    # the agent is somewhere in between the track borders
    if all_wheels_on_track and (0.5*track_width - distance_from_center) >= 0.05:
        reward = 1.0

    # Always return a float value
    return float(reward)

. Follow Center Line In this example we measure how far away the car is from the center of the track, and give a higher reward if the car is close to the center line. This example uses the track_width and distance_from_center parameters, and returns a decreasing reward the further the car is from the center of the track. This example is more specific about what kind of driving behavior to reward, so an agent trained with this function is likely to learn to follow the track very well. However, it is unlikely to learn any other behavior such as accelerating or braking for corners.

def reward_function(params):
    '''
    Example of rewarding the agent to follow center line
    '''

    # Read input parameters
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']

    # Calculate 3 markers that are at varying distances away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
       reward = 1e-3  # likely crashed/ close to off track

    return float(reward)

Prevent zig-zag This example incentivizes the agent to follow the center line but penalizes with lower reward if it steers too much, which will help prevent zig-zag behavior. The agent will learn to drive smoothly in the simulator and likely display the same behavior when deployed in the physical vehicle.

def reward_function(params):
    '''
    Example of penalize steering, which helps mitigate zig-zag behaviors
    '''
    # Read input parameters
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    abs_steering = abs(params['steering_angle']) # Only need the absolute steering angle
    # Calculate 3 marks that are farther and father away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width
    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track
    # Steering penality threshold, change the number based on your action space setting
    ABS_STEERING_THRESHOLD = 15 
    # Penalize reward if the car is steering too much
    if abs_steering > ABS_STEERING_THRESHOLD:
        reward *= 0.8
    return float(reward)

how to be fast tip

More ref

kurtzace commented 3 days ago

A to Z Speedway It’s easier for an agent to navigate this extra wide version of re:Invent 2018. Use it to get started with object avoidance and head-to-head race training.

Length: 16.64 m (54.59') Width: 107 cm (42")

Direction: Clockwise, Counterclockwise

kurtzace commented 3 days ago

when in anti clockwise

heading - 125

heading 178

180 on top

-77 on way down

kurtzace commented 2 days ago

Random thoughts on What could an ideal reward function be?

is_reversed - give it negative reward
is_offtrack - give it negative reward
Prevent zig-zag (copy the reward function above)
is_crashed - give it negative reward
give it positive reward for speed to be always above 1.3
Heading should not vary by 50 degree from the previous heading that is sterring angle should not be more than 50 degree
when the speed is 4 - it must be very close to the center line
when the speed is 1.5 - it can be a slightly away from the center line, and all wheels on track could be false

kurtzace commented 2 days ago

Think in terms of percentages

kurtzace commented 2 days ago

clockwise way points


import matplotlib.pyplot as plt
import numpy as np
tracksPath = '~/Downloads/reInvent2019_wide_cw.npy'
# Track name
track_name = "A to Z Speedway"

# Location of tracks folder
absolute_path = "."

# Get waypoints from numpy file

waypoints = np.load(tracksPath)

# Get number of waypoints
print("Number of waypoints = " + str(waypoints.shape[0]))

# Plot waypoints
for i, point in enumerate(waypoints):
    waypoint = (point[2], point[3])
    plt.scatter(waypoint[0], waypoint[1])
    plt.text(waypoint[0], waypoint[1], str(i), fontsize=9, ha='right')
    print("Waypoint " + str(i) + ": " + str(waypoint))

# Display the plot
plt.xlabel('X Coordinate')
plt.ylabel('Y Coordinate')
plt.title(f'Waypoints for {track_name}')
plt.show()

kurtzace commented 2 days ago

Simple reward

is_reversed - give it negative reward
is_offtrack - give it negative reward
Prevent zig-zag (copy the reward function above)
is_crashed - give it negative reward
give it positive reward for speed to be always above 1.3
Positive reward: In general have a speed of 1.5 for turns and 3.5 to 4 for straight roads.
Heading should not vary by 50 degree from the previous heading that is sterring angle should not be more than 50 degree
when the speed is 4 - it must be very close to the center line
when car goes away from centre line by 15% then reduce speed to 3
when car goes away from centre line by 30% then reduce speed to 2
when car goes away from centre line by 50% then reduce speed to 1.4
Keep a very high reward for being on center line (copy from above reward function)
all wheels should be on track - high reward
We want the vehicle to drive toward the correct direction. By "correct direction" one obvious candidate is the waypoints that outline the center line of the track.

Eval

kurtzace commented 2 days ago

clockwise way point

better waypoints for clockwise

is_offtrack - give it negative reward
Prevent zig-zag (copy the reward function above)
is_crashed - give it negative reward
give it positive reward for speed to be always above 1.3
between 2 to 7 have a speed of 1.5. Positive reward for between waypoints 7-12 have a speed of 4,between 12 to 24 have a speed of 1.5 . between 25 to 35 have a speed of 3, then 36 to 45 have a speed of 1.5, 45 to 55 - have a speed of 4. In general have a speed of 1.5 for sharp turns and 3.5 to 4 for straight roads.
Heading should not vary by 50 degree from the previous heading that is sterring angle should not be more than 50 degree
when the speed is 4 - it must be very close to the center line
when the speed is 1.5 - it can be a slightly away from the center line, and all wheels on track could be false
strong rights should be planned for 3 to 7, 13 to 18, and 21 to 24 and 35 to 42
2 to 7 - try to be on right of center, 26 to 32 - be on left of center, 35 to 45 be on right of center, 69 to 66 be on right of center
other ranges remain on centre
plan a slight left for 28 to 32
We want the vehicle to drive toward the correct direction. By "correct direction" one obvious candidate is the waypoints that outline the center line of the track.

Evaluation with limits of 1.5 to 3 speed

kurtzace commented 2 days ago

percentage reward function

is_reversed - give it negative reward
is_offtrack - give it negative reward
Prevent zig-zag (copy the reward function above)
is_crashed - give it negative reward
give it positive reward for speed to be always above 1.3
Positive reward: In general have a speed of 1.5 for turns and 3.5 to 4 for straight roads.
Heading should not vary by 50 degree from the previous heading that is sterring angle should not be more than 50 degree
when the speed is 4 - it must be very close to the center line
when car goes away from centre line by 15% then reduce speed to 3
when car goes away from centre line by 30% then reduce speed to 2
when car goes away from centre line by 50% then reduce speed to 1.4
Think in terms of percentages image (progress, Type: float, Range: 0:100) is a parameter

75% to 100% - speed of 4

60% to 74% - speed of 1.5

40% to 59% - speed of 3

25% to 39% - speed of 1.5

10% to 24% - speed of 3

0% to 9% - speed of 1.5

percentage - reward for turning

75% to 100% - follow center line (reward fuction above)

60% to 74% - turn right

40% to 59% - follow center line (reward fuction above)

25% to 39% - turn right

10% to 24% - speed of 3 follow center line (reward fuction above), but also mild turning to left

0% to 9% - speed of 1.5 - turn right

distance_from_center: Type: float, Range: 0:~track_width/2

0% to 9% - could be right from center line by 50% 75% to 100% - speed of 4 - should be exect on center line

60% to 74% - speed of 1.5 - could be right from center line by 50%

35% to 40% - speed of 3 - could be right from center line by 50%

40% to 60% - speed of 3 - could be left from center line by 50%

25% to 39% - speed of 1.5 - could be right from center line by 50%

10% to 24% - speed of 3 - should be exect on center line

Heading when percentage is

75% to 100% - heading could be 180 degree

40% to 59% - heading could be -55 degree

10% to 24% - heading could be 103 degree

25% to 39% - heading could be 0 degree

We want the vehicle to drive toward the correct direction. By "correct direction" one obvious candidate is the waypoints that outline the center line of the track.

Eval

kurtzace commented 2 days ago

kurtzace / diary-2024

AWS deep racer #14

parameters

more parameters

Heading

Example

More ref

Simple reward

Eval

clockwise way point

percentage reward function

Eval

CombinedWaypointsClockwiseAndSimple