Closed valGavin closed 3 years ago
Hi. I believe the difference is that KITTI stores the annotations/predictions in local coordinates, whereas nuScenes stores them in global coordinates, e.g. [1500.37409205876, 2970.935516773256, 0.8198208637558824]
, which is 1500m East, 2970m North (?), and 0.8m up, relative to the map coordinate frame.
The function kitti_res_to_nuscenes
does not know what map and sample you are operating on, hence it cannot map to global coordinates. You'd probably need to add that conversion.
Thank you for your response.
Hi @holger-nutonomy @valGavin
I think my question falls under this bin. If not I can, repost a separate question:
As you can see its predicting a total of 4 things, but its out of frame.
I also noticed that the predicted bbox values re in some normalized coordinate system whereas nuscenes2kitti conversion does it in some global coordinate system as @holger-nutonomy mentioned above. Can you please help me on how to perform this conversion ? The predictions from my model (trained on kitti) are in normalized coordinates. Is there a way to convert it to global coordinates? Any help you can provide is much appreciated. Thank you very much.
Hey @kaushik333 . As mentioned by @holger-nutonomy , this is a coordinate issue. I did a modification in the python-sdk/nuscenes/utils/kitti.py script.
Under get_boxes
function, there are four steps provided by the original code. I added another step between 4: Transform to nuScenes LIDAR coord system and Set score or Nan:
# 5. Translate the box center point to follow the nuScenes global coordinates
if (pose_record is not None) and (cs_record is not None):
box.rotate(Quaternion(cs_record['rotation']))
box.translate(np.array(cs_record['translation']))
box.rotate(Quaternion(pose_record['rotation']))
box.translate(np.array(pose_record['translation']))
Which require you to pass the cs_record
and pose_record
value to this function.
You can get those values using:
if coord_transform:
sample = self.nusc.get('sample', sample_token)
sample_data = self.nusc.get('sample_data', sample['data']['LIDAR_TOP'])
pose_record = self.nusc.get('ego_pose', sample_data['ego_pose_token'])
cs_record = self.nusc.get('calibrated_sensor', sample_data['calibrated_sensor_token'])
by providing the sample_token
of the corresponding image.
It worked well with me. I hope this helps your issue.
Hey @kaushik333 . As mentioned by @holger-nutonomy , this is a coordinate issue. I did a modification in the python-sdk/nuscenes/utils/kitti.py script.
Under
get_boxes
function, there are four steps provided by the original code. I added another step between 4: Transform to nuScenes LIDAR coord system and Set score or Nan:# 5. Translate the box center point to follow the nuScenes global coordinates if (pose_record is not None) and (cs_record is not None): box.rotate(Quaternion(cs_record['rotation'])) box.translate(np.array(cs_record['translation'])) box.rotate(Quaternion(pose_record['rotation'])) box.translate(np.array(pose_record['translation']))
Which require you to pass the
cs_record
andpose_record
value to this function. You can get those values using:if coord_transform: sample = self.nusc.get('sample', sample_token) sample_data = self.nusc.get('sample_data', sample['data']['LIDAR_TOP']) pose_record = self.nusc.get('ego_pose', sample_data['ego_pose_token']) cs_record = self.nusc.get('calibrated_sensor', sample_data['calibrated_sensor_token'])
by providing the
sample_token
of the corresponding image.It worked well with me. I hope this helps your issue.
Thanks for this response. Sorry for the delayed reply. Well turns out that because I was using a model pretrained on kitti to perform inference on NuScenes, I was facing this issue. I just retrained the network on NuScenes and now it seems to perform as expected.
Although I'm extremely surprised by the outcome because, I make use of only the lidar sensor values and not the radar. This is a very sparse point cloud. Inspite of this I achieve reasonable bbox deductions (visually). Although numbers wise, my performance is crap. Any thoughts towards this? @holger-nutonomy @valGavin ?
Hi @holger-motional I too encountered the same problem, and therefore, arrived at this thread.
KITTI stores the annotations/predictions in local coordinates, whereas nuScenes stores them in global coordinates,
Thank you for this insightful reply. I followed up with @valGavin fix
Under get_boxes function, there are four steps provided by the original code. I added another step between 4: Transform to nuScenes LIDAR coord system and Set score or Nan:
# 5. Translate the box center point to follow the nuScenes global coordinates if (pose_record is not None) and (cs_record is not None): box.rotate(Quaternion(cs_record['rotation'])) box.translate(np.array(cs_record['translation'])) box.rotate(Quaternion(pose_record['rotation'])) box.translate(np.array(pose_record['translation']))
Which require you to pass the
cs_record
andpose_record
value to this function inkitti_res_to_nuscenes()
inexport_kitti.py
if coord_transform: sample = self.nusc.get('sample', sample_token) sample_data = self.nusc.get('sample_data', sample['data']['LIDAR_TOP']) pose_record = self.nusc.get('ego_pose', sample_data['ego_pose_token']) cs_record = self.nusc.get('calibrated_sensor', sample_data['calibrated_sensor_token'])
After applying this fix, I expected AP3D to be insanely high (close to 1.00) for all classes, and ATE, ASE and AOE to go down to zero. AVE and AAE at minimum are OK because KITTI format does not have any attribute or velocity labels as mentioned here.
However, even after applying his fix, I see the following outputs
mAP: 0.1610
mATE: 1.0000
mASE: 1.0000
mAOE: 1.0000
mAVE: 1.0000
mAAE: 1.0000
NDS: 0.0805
Eval time: 17.5s
Per-class results:
Object Class AP ATE ASE AOE AVE AAE
car 0.160 1.000 1.000 1.000 1.000 1.000
truck 0.196 1.000 1.000 1.000 1.000 1.000
bus 0.322 1.000 1.000 1.000 1.000 1.000
trailer 0.209 1.000 1.000 1.000 1.000 1.000
construction 0.115 1.000 1.000 1.000 1.000 1.000
pedestrian 0.102 1.000 1.000 1.000 1.000 1.000
motorcycle 0.214 1.000 1.000 1.000 1.000 1.000
bicycle 0.084 1.000 1.000 1.000 1.000 1.000
traffic_cone 0.073 1.000 1.000 nan nan nan
barrier 0.137 1.000 1.000 1.000 nan nan
Is it normal to get such low AP values for car and such high translational errors and scale errors even after using val ground truth for evaluation? It would be great if you have any insights in this regard.
I am also posting the screenshot of entire run for your reference
@abhi1kumar There is definitely a bug in your code. Other submissions have far better results: https://www.nuscenes.org/object-detection. With ground-truth all numbers should be 1.
@abhi1kumar There is definitely a bug in your code. Other submissions have far better results: https://www.nuscenes.org/object-detection. With ground-truth all numbers should be 1.
Thank you @holger-motional for your quick reply. Yes, I completely agree with you. Others have obtained much higher numbers while testing. And, so using val ground truth (oracle) should definitely have much higher numbers.
I had another related question. Does nuScenes consider the outputs of all six cameras for evaluation or if a single camera data is given, it only considers single camera data for evaluation? If your answer is the former, I know the reason behind this 16% for cars. I am testing the front cameras. So, this 16% AP on cars is because one of the cameras is fully correct while other five cameras are completely wrong.
I guess that's your problem :-). nuScenes has annotations from 360 degrees and uses all of them for evaluation. If you want to only evaluate on the front camera, you would have to drop the ground-truth boxes that fall into all other cameras.
Thank you once again @holger-motional for your quick reply on my issue.
The official evaluation uses six cameras, and therefore, I too have to use all the six cameras for proper benchmarking of our method.
@abhi1kumar I guess you can just run it on each camera, combine the boxes, run non-maximum suppression and get the final set of results.
@holger-motional Thank you once again for helping me out. This is what I have been trying to achieve. I am doing monocular 3D object detection, and my KITTI training pipeline is set. Hence, all I wanted was to convert nuScenes images to the KITTI format, train with the nuScenes images with the KITTI pipeline, get the results in the KITTI format, convert to nuScenes format, and finally upload to the nuScenes server.
The CAM_FRONT camera for nuscenes gets converted to KITTI format without errors. However, other cameras throw the error as mentioned here Going by your answer, other cameras are ambigious, and they throw an assertion error (since the rotations are no longer identities).
I did try commenting out the assertion https://github.com/nutonomy/nuscenes-devkit/blob/864d0a207539e5383cd3eb26ebb1d7a44622f09d/python-sdk/nuscenes/scripts/export_kitti.py#L151 to get kitti style calib and label files.
However, when I try to convert these label files back to nuScenes format, the following error pops up.
ValueError("Matrix must be orthogonal, i.e. its transpose should be its inverse")
at this line https://github.com/nutonomy/nuscenes-devkit/blob/864d0a207539e5383cd3eb26ebb1d7a44622f09d/python-sdk/nuscenes/utils/kitti.py#L326 because the matrix can not be inverted.
Therefore, do you know of any public repo which output the ground truths in the local camera coordinates (for each of the cameras) in the nuscenes. I am neither using LiDAR nor radar data. So, anything which brings objects in the local camera coordinates and converts back to global coordinates for nuscenes images should be fine.
PS - This code looks to do the same, but I have not tested it out.
@abhi1kumar Unfortunately I am not aware of any such code :-(.
Hello, nuScenes contributors. Thank you for providing the nuscenes-devkit; it's been a great help for my research. However, there's something I need to ask about the 3D Detection Evaluation code.
What I've done:
nuscenes_gt_to_kitti
presents in the export_kitti.pykitti_res_to_nuscenes
The problem:
Note:
kitti_res_to_nuscenes
using the validation groundtruth (nuScenes -> KITTI -> JSON), and re-run the detection/evaluate.py.I'm sorry for the long question, and I hope you can provide some answer and solution. Thank you in advance.