wzzheng / OccWorld

[ECCV 2024] 3D World Model for Autonomous Driving
https://wzzheng.net/OccWorld/
Apache License 2.0
380 stars 25 forks source link

The using of gt_mode will lead a future info leaking? #30

Open Orbis36 opened 1 week ago

Orbis36 commented 1 week ago

As you mentioned in the other issue, the gt_mode generated from VAD repo. You use the gt_ego_fut_cmd there as the gt_mode in this codebase. However, if you check the generation of gt_ego_fut_cmd in VAD, the code is like:

# get ego futute traj (offset format)
            ego_fut_trajs = np.zeros((fut_ts+1, 3))
            ego_fut_masks = np.zeros((fut_ts+1))
            sample_cur = sample
            for i in range(fut_ts+1):
                pose_mat = get_global_sensor_pose(sample_cur, nusc, inverse=False)
                ego_fut_trajs[i] = pose_mat[:3, 3]
                ego_fut_masks[i] = 1
                if sample_cur['next'] == '':
                    ego_fut_trajs[i+1:] = ego_fut_trajs[i]
                    break
                else:
                    sample_cur = nusc.get('sample', sample_cur['next'])
            # global to ego at lcf
            ego_fut_trajs = ego_fut_trajs - np.array(pose_record['translation'])
            rot_mat = Quaternion(pose_record['rotation']).inverse.rotation_matrix
            ego_fut_trajs = np.dot(rot_mat, ego_fut_trajs.T).T
            # ego to lidar at lcf
            ego_fut_trajs = ego_fut_trajs - np.array(cs_record['translation'])
            rot_mat = Quaternion(cs_record['rotation']).inverse.rotation_matrix
            ego_fut_trajs = np.dot(rot_mat, ego_fut_trajs.T).T
            # drive command according to final fut step offset from lcf
            if ego_fut_trajs[-1][0] >= 2:
                command = np.array([1, 0, 0])  # Turn Right
            elif ego_fut_trajs[-1][0] <= -2:
                command = np.array([0, 1, 0])  # Turn Left
            else:
                command = np.array([0, 0, 1])  # Go Straight
            # offset from lcf -> per-step offset
            ego_fut_trajs = ego_fut_trajs[1:] - ego_fut_trajs[:-1]

It means the the gt_mode for every frame used the 6-th frame in future frame as reference(fut_ts is 6 in the code). In your case, I noticed the offset is 1 in your code, is that means the network have know the future infos? From your paper, it seems that the network's performance heavily rely on this design(The trajactory prediction), so could you please give us a reasonable explanation? @wzzheng @gusongen

realkris commented 1 week ago

same question here