As you mentioned in the other issue, the gt_mode generated from VAD repo.
You use the gt_ego_fut_cmd there as the gt_mode in this codebase.
However, if you check the generation of gt_ego_fut_cmd in VAD, the code is like:
# get ego futute traj (offset format)
ego_fut_trajs = np.zeros((fut_ts+1, 3))
ego_fut_masks = np.zeros((fut_ts+1))
sample_cur = sample
for i in range(fut_ts+1):
pose_mat = get_global_sensor_pose(sample_cur, nusc, inverse=False)
ego_fut_trajs[i] = pose_mat[:3, 3]
ego_fut_masks[i] = 1
if sample_cur['next'] == '':
ego_fut_trajs[i+1:] = ego_fut_trajs[i]
break
else:
sample_cur = nusc.get('sample', sample_cur['next'])
# global to ego at lcf
ego_fut_trajs = ego_fut_trajs - np.array(pose_record['translation'])
rot_mat = Quaternion(pose_record['rotation']).inverse.rotation_matrix
ego_fut_trajs = np.dot(rot_mat, ego_fut_trajs.T).T
# ego to lidar at lcf
ego_fut_trajs = ego_fut_trajs - np.array(cs_record['translation'])
rot_mat = Quaternion(cs_record['rotation']).inverse.rotation_matrix
ego_fut_trajs = np.dot(rot_mat, ego_fut_trajs.T).T
# drive command according to final fut step offset from lcf
if ego_fut_trajs[-1][0] >= 2:
command = np.array([1, 0, 0]) # Turn Right
elif ego_fut_trajs[-1][0] <= -2:
command = np.array([0, 1, 0]) # Turn Left
else:
command = np.array([0, 0, 1]) # Go Straight
# offset from lcf -> per-step offset
ego_fut_trajs = ego_fut_trajs[1:] - ego_fut_trajs[:-1]
It means the the gt_mode for every frame used the 6-th frame in future frame as reference(fut_ts is 6 in the code). In your case, I noticed the offset is 1 in your code, is that means the network have know the future infos? From your paper, it seems that the network's performance heavily rely on this design(The trajactory prediction), so could you please give us a reasonable explanation? @wzzheng @gusongen
As you mentioned in the other issue, the gt_mode generated from VAD repo. You use the gt_ego_fut_cmd there as the gt_mode in this codebase. However, if you check the generation of gt_ego_fut_cmd in VAD, the code is like:
It means the the gt_mode for every frame used the 6-th frame in future frame as reference(fut_ts is 6 in the code). In your case, I noticed the offset is 1 in your code, is that means the network have know the future infos? From your paper, it seems that the network's performance heavily rely on this design(The trajactory prediction), so could you please give us a reasonable explanation? @wzzheng @gusongen