Open rOtking opened 1 year ago
VAD has three MLP-based decoder heads corresponding to different driving commands to predict planning trajectories. In the training phase, the driving command is used as a mask to train the corresponding head: https://github.com/hustvl/VAD/blob/1ede800a517653837faf78b5ec1c3b5fc3ec77b5/projects/mmdet3d_plugin/VAD/VAD_head.py#L1139 In the inference phase, VAD uses the command to choose the final planning trajectory among predictions from three decoder heads.
VAD has three MLP-based decoder heads corresponding to different driving commands to predict planning trajectories. In the training phase, the driving command is used as a mask to train the corresponding head:
In the inference phase, VAD uses the command to choose the final planning trajectory among predictions from three decoder heads.
I notice that what VAD uses for navigation is actually somehow temporally fine-grained driving direction that indicates model which trajectory to take per frame, whilst in real-world circumstances such detailed information might not be available (one way is to obtain it from the driver's maneuver but that makes VAD less-autonomous) Do you have a plan to solve this problem?
In fact, I think the navigation information ('go straight,' 'turn left,' 'turn right') used by VAD (as well as previous methods) is quite simplified. Navigation information at this level should be easily obtainable by the navigation system, without relying on fine-grained map data. In real-world applications, more detailed navigation information is required for accurate planning. However, this is not the current focus of VAD or our work.
Navigation information at this level should be easily obtainable by the navigation system
VAD is using accurate ego future pose (fined-grained waypoint, which is why VAD is able to do auto lane change) to generate driving commands, which is NOT easily obtainable by navigation system without fined-grained map data.
`
ego_fut_trajs = np.zeros((fut_ts+1, 3))
ego_fut_masks = np.zeros((fut_ts+1))
sample_cur = sample
for i in range(fut_ts+1):
pose_mat = get_global_sensor_pose(sample_cur, nusc, inverse=False)
ego_fut_trajs[i] = pose_mat[:3, 3]
ego_fut_masks[i] = 1
if sample_cur['next'] == '':
ego_fut_trajs[i+1:] = ego_fut_trajs[i]
break
else:
sample_cur = nusc.get('sample', sample_cur['next'])
# global to ego at lcf
ego_fut_trajs = ego_fut_trajs - np.array(pose_record['translation'])
rot_mat = Quaternion(pose_record['rotation']).inverse.rotation_matrix
ego_fut_trajs = np.dot(rot_mat, ego_fut_trajs.T).T
# ego to lidar at lcf
ego_fut_trajs = ego_fut_trajs - np.array(cs_record['translation'])
rot_mat = Quaternion(cs_record['rotation']).inverse.rotation_matrix
ego_fut_trajs = np.dot(rot_mat, ego_fut_trajs.T).T
# drive command according to final fut step offset from lcf
if ego_fut_trajs[-1][0] >= 2:
command = np.array([1, 0, 0]) # Turn Right
elif ego_fut_trajs[-1][0] <= -2:
command = np.array([0, 1, 0]) # Turn Left
else:
command = np.array([0, 0, 1]) # Go Straight
`
Navigation information at this level should be easily obtainable by the navigation system
VAD is using accurate ego future pose (fined-grained waypoint, which is why VAD is able to do auto lane change) to generate driving commands, which is NOT easily obtainable by navigation system without fined-grained map data.
`
ego_fut_trajs = np.zeros((fut_ts+1, 3)) ego_fut_masks = np.zeros((fut_ts+1)) sample_cur = sample for i in range(fut_ts+1): pose_mat = get_global_sensor_pose(sample_cur, nusc, inverse=False) ego_fut_trajs[i] = pose_mat[:3, 3] ego_fut_masks[i] = 1 if sample_cur['next'] == '': ego_fut_trajs[i+1:] = ego_fut_trajs[i] break else: sample_cur = nusc.get('sample', sample_cur['next']) # global to ego at lcf ego_fut_trajs = ego_fut_trajs - np.array(pose_record['translation']) rot_mat = Quaternion(pose_record['rotation']).inverse.rotation_matrix ego_fut_trajs = np.dot(rot_mat, ego_fut_trajs.T).T # ego to lidar at lcf ego_fut_trajs = ego_fut_trajs - np.array(cs_record['translation']) rot_mat = Quaternion(cs_record['rotation']).inverse.rotation_matrix ego_fut_trajs = np.dot(rot_mat, ego_fut_trajs.T).T # drive command according to final fut step offset from lcf if ego_fut_trajs[-1][0] >= 2: command = np.array([1, 0, 0]) # Turn Right elif ego_fut_trajs[-1][0] <= -2: command = np.array([0, 1, 0]) # Turn Left else: command = np.array([0, 0, 1]) # Go Straight
`
VAD uses ego poses to generate driving commands in open-loop nuScenes following previous works, because nuScenes does not provide additional navigation info. In real-world applications, navigation info should be generated according to the route, but not ego future poses (also can not be achieved).
I find the comannd is generated by ego future position,but I can't find where is the high-command used in model。I find cmd is encoded as an embedding and concated with ego query in the UniAD, but the command seems only used in the visualization and isn‘t input into the model in VAD。