hustvl / VAD

[ICCV 2023] VAD: Vectorized Scene Representation for Efficient Autonomous Driving
https://arxiv.org/abs/2303.12077
Apache License 2.0
702 stars 73 forks source link

Where is the high-command used in? #31

Open rOtking opened 1 year ago

rOtking commented 1 year ago

I find the comannd is generated by ego future position,but I can't find where is the high-command used in model。I find cmd is encoded as an embedding and concated with ego query in the UniAD, but the command seems only used in the visualization and isn‘t input into the model in VAD。

rb93dett commented 1 year ago

VAD has three MLP-based decoder heads corresponding to different driving commands to predict planning trajectories. In the training phase, the driving command is used as a mask to train the corresponding head: https://github.com/hustvl/VAD/blob/1ede800a517653837faf78b5ec1c3b5fc3ec77b5/projects/mmdet3d_plugin/VAD/VAD_head.py#L1139 In the inference phase, VAD uses the command to choose the final planning trajectory among predictions from three decoder heads.

Wolfybox commented 1 year ago

VAD has three MLP-based decoder heads corresponding to different driving commands to predict planning trajectories. In the training phase, the driving command is used as a mask to train the corresponding head:

https://github.com/hustvl/VAD/blob/1ede800a517653837faf78b5ec1c3b5fc3ec77b5/projects/mmdet3d_plugin/VAD/VAD_head.py#L1139

In the inference phase, VAD uses the command to choose the final planning trajectory among predictions from three decoder heads.

I notice that what VAD uses for navigation is actually somehow temporally fine-grained driving direction that indicates model which trajectory to take per frame, whilst in real-world circumstances such detailed information might not be available (one way is to obtain it from the driver's maneuver but that makes VAD less-autonomous) Do you have a plan to solve this problem?

rb93dett commented 1 year ago

In fact, I think the navigation information ('go straight,' 'turn left,' 'turn right') used by VAD (as well as previous methods) is quite simplified. Navigation information at this level should be easily obtainable by the navigation system, without relying on fine-grained map data. In real-world applications, more detailed navigation information is required for accurate planning. However, this is not the current focus of VAD or our work.

Wolfybox commented 1 year ago

Navigation information at this level should be easily obtainable by the navigation system

VAD is using accurate ego future pose (fined-grained waypoint, which is why VAD is able to do auto lane change) to generate driving commands, which is NOT easily obtainable by navigation system without fined-grained map data.

`

       ego_fut_trajs = np.zeros((fut_ts+1, 3)) 
       ego_fut_masks = np.zeros((fut_ts+1))
        sample_cur = sample
        for i in range(fut_ts+1):
            pose_mat = get_global_sensor_pose(sample_cur, nusc, inverse=False)
            ego_fut_trajs[i] = pose_mat[:3, 3]
            ego_fut_masks[i] = 1
            if sample_cur['next'] == '':
                ego_fut_trajs[i+1:] = ego_fut_trajs[i]
                break
            else:
                sample_cur = nusc.get('sample', sample_cur['next'])
        # global to ego at lcf
        ego_fut_trajs = ego_fut_trajs - np.array(pose_record['translation'])
        rot_mat = Quaternion(pose_record['rotation']).inverse.rotation_matrix
        ego_fut_trajs = np.dot(rot_mat, ego_fut_trajs.T).T
        # ego to lidar at lcf
        ego_fut_trajs = ego_fut_trajs - np.array(cs_record['translation'])
        rot_mat = Quaternion(cs_record['rotation']).inverse.rotation_matrix
        ego_fut_trajs = np.dot(rot_mat, ego_fut_trajs.T).T
        # drive command according to final fut step offset from lcf
        if ego_fut_trajs[-1][0] >= 2:
            command = np.array([1, 0, 0])  # Turn Right
        elif ego_fut_trajs[-1][0] <= -2:
            command = np.array([0, 1, 0])  # Turn Left
        else:
            command = np.array([0, 0, 1])  # Go Straight

`

rb93dett commented 1 year ago

Navigation information at this level should be easily obtainable by the navigation system

VAD is using accurate ego future pose (fined-grained waypoint, which is why VAD is able to do auto lane change) to generate driving commands, which is NOT easily obtainable by navigation system without fined-grained map data.

`

       ego_fut_trajs = np.zeros((fut_ts+1, 3)) 
       ego_fut_masks = np.zeros((fut_ts+1))
        sample_cur = sample
        for i in range(fut_ts+1):
            pose_mat = get_global_sensor_pose(sample_cur, nusc, inverse=False)
            ego_fut_trajs[i] = pose_mat[:3, 3]
            ego_fut_masks[i] = 1
            if sample_cur['next'] == '':
                ego_fut_trajs[i+1:] = ego_fut_trajs[i]
                break
            else:
                sample_cur = nusc.get('sample', sample_cur['next'])
        # global to ego at lcf
        ego_fut_trajs = ego_fut_trajs - np.array(pose_record['translation'])
        rot_mat = Quaternion(pose_record['rotation']).inverse.rotation_matrix
        ego_fut_trajs = np.dot(rot_mat, ego_fut_trajs.T).T
        # ego to lidar at lcf
        ego_fut_trajs = ego_fut_trajs - np.array(cs_record['translation'])
        rot_mat = Quaternion(cs_record['rotation']).inverse.rotation_matrix
        ego_fut_trajs = np.dot(rot_mat, ego_fut_trajs.T).T
        # drive command according to final fut step offset from lcf
        if ego_fut_trajs[-1][0] >= 2:
            command = np.array([1, 0, 0])  # Turn Right
        elif ego_fut_trajs[-1][0] <= -2:
            command = np.array([0, 1, 0])  # Turn Left
        else:
            command = np.array([0, 0, 1])  # Go Straight

`

VAD uses ego poses to generate driving commands in open-loop nuScenes following previous works, because nuScenes does not provide additional navigation info. In real-world applications, navigation info should be generated according to the route, but not ego future poses (also can not be achieved).