uzh-rpg / agile_flight

Developing and Comparing Vision-based Algorithms for Vision-based Agile Flight
MIT License
152 stars 55 forks source link

rewards ? #24

Closed ngthanhtin closed 2 years ago

ngthanhtin commented 2 years ago

Hi, when I tried to print the info given when an episode has done, I found this problem: Screenshot from 2022-04-04 22-50-24 As you can see, the total reward 'r' is not equal to the sum of all 4 reward components? Why would this happen? Can you explain more, please?

yun-long commented 2 years ago

Hi,

the reward is computed here:

bool VisionEnv::computeReward(Ref<Vector<>> reward) {
  // ---------------------- reward function design
  // - compute collision penalty
  Scalar collision_penalty = 0.0;
  size_t idx = 0;
  for (size_t sort_idx : sort_indexes(relative_pos_norm_)) {
    if (idx >= visionenv::kNObstacles) break;

    Scalar relative_dist =
      relative_pos_norm_[sort_idx]
        ? (relative_pos_norm_[sort_idx] > 0) &&
            (relative_pos_norm_[sort_idx] < max_detection_range_)
        : max_detection_range_;

    const Scalar dist_margin = 0.5;
    if (relative_pos_norm_[sort_idx] <=
        obstacle_radius_[sort_idx] + dist_margin) {
      // compute distance penalty
      collision_penalty += collision_coeff_ * std::exp(-1.0 * relative_dist);
    }

    idx += 1;
  }

  // - tracking a constant linear velocity
  Scalar lin_vel_reward =
    vel_coeff_ * (quad_state_.v - goal_linear_vel_).norm();

  // - angular velocity penalty, to avoid oscillations
  const Scalar ang_vel_penalty = angular_vel_coeff_ * quad_state_.w.norm();

  //  change progress reward as survive reward
  const Scalar total_reward =
    lin_vel_reward + collision_penalty + ang_vel_penalty + survive_rew_;

  // return all reward components for debug purposes
  // only the total reward is used by the RL algorithm
  reward << lin_vel_reward, collision_penalty, ang_vel_penalty, survive_rew_,
    total_reward;
  return true;
}

This is called step/stage reward.

The terminal reward is computed differently here depending on the terminal condition:

bool VisionEnv::isTerminalState(Scalar &reward) {
  // simulation time out
  if (cmd_.t >= max_t_ - sim_dt_) {
    reward = 0.0;
    return true;
  }

  // world boundling box check
  // - x, y, and z
  const Scalar safty_threshold = 0.1;
  bool x_valid = quad_state_.p(QS::POSX) >= world_box_[0] + safty_threshold &&
                 quad_state_.p(QS::POSX) <= world_box_[1] - safty_threshold;
  bool y_valid = quad_state_.p(QS::POSY) >= world_box_[2] + safty_threshold &&
                 quad_state_.p(QS::POSY) <= world_box_[3] - safty_threshold;
  bool z_valid = quad_state_.x(QS::POSZ) >= world_box_[4] + safty_threshold &&
                 quad_state_.x(QS::POSZ) <= world_box_[5] - safty_threshold;
  if (!x_valid || !y_valid || !z_valid) {
    reward = -1.0;
    return true;
  }
  return false;
}
ngthanhtin commented 2 years ago

Thanks, I got it, the 'r' has to be added with the terminal reward as well. But taking the sum over many steps makes the error around 0.01.