Open kylelee82 opened 4 years ago
Can confirm. In each sample, lidar and camera records refer to different ego_pose
records. Those poses differ in rotation up to 1 degree and translation up to 0.5 m (I think those are meters), but timestamp is the same.
import math
my_scene = level5data.scene[0]
sample_token = my_scene["first_sample_token"]
while sample_token != "":
sample = ex.get("sample", sample_token)
sample_token = sample["next"]
cam_sensor_token = sample["data"]["CAM_FRONT"]
cam_sd_record = ex.get("sample_data", cam_sensor_token)
cam_ep_record = ex.get("ego_pose", cam_sd_record["ego_pose_token"])
lid_sensor_token = sample["data"]["LIDAR_TOP"]
lid_sd_record = ex.get("sample_data", lid_sensor_token)
lid_ep_record = ex.get("ego_pose", lid_sd_record["ego_pose_token"])
cam_qua = Quaternion(cam_ep_record["rotation"])
lid_qua = Quaternion(lid_ep_record["rotation"])
diff_deg = (lid_qua.inverse * cam_qua).degrees
diff_m = math.sqrt(sum((a-b)*(a-b) for a, b in zip(cam_ep_record["translation"], lid_ep_record["translation"])))
diff_ts = abs(cam_ep_record["timestamp"] - lid_ep_record["timestamp"])
print(round(diff_deg, 3), "deg", round(diff_m, 3), "m", diff_ts)
...
1.077 deg 0.25 m 0.0
1.095 deg 0.225 m 0.0
1.052 deg 0.242 m 0.0
0.953 deg 0.255 m 0.0
0.887 deg 0.256 m 0.0
0.869 deg 0.245 m 0.0
...
@megaserg In the example above, do you happen to know if the ego_pose_token
s are different? I would assume so because of the ::get("ego_pose", ...)
call, but I'm just curious because I have recently looked for duplicate timestamps in the Lyft dataset and didn't find any. Perhaps either my debugging was wrong, or I have been ignoring the ego_pose timestamp in favor of the sensor timestamp.
If the tokens are distinct and yet have the same timestamp, then it sounds like the Lyft dataset itself might be "broken." (Could be patched through code though).
@pwais yes, the ego_pose_token
is different between the cameras and the lidars. The timestamps of the referred poses are the same though.
oof! thanks for confirming @megaserg
ISSUE
Currently in
export_kitti.py
the following transformation is incorrect:Unlike nuscenes (which I didn't check, but I believe to be correct), the camera and lidar ego poses for this dataset are not the same. The effect of the code is above is that if you use the RGB camera images with projected labels from lidar the boxes will be randomly off by 10-20 pixels, which is problematic for any sort of 2D learning.
To correct this, two additional transformations are needed to convert to / from world pose for both lidar and camera.
Additionally, if I recall, the
render
function does not have the same issue as this KITTI converter.Related PR: https://github.com/lyft/nuscenes-devkit/pull/75