Matching YoLo detections to Nuscenes ground truth detection boxes

abadithela commented 2 years ago

Hi,

I don't actively work in computer vision, so the following question might be a trivial one. I am trying to apply YoLo for object detection to the nuscenes dataset on the vision data only. YoLo returns pixel coordinates of the objects detected in the image. How do I convert this pixel output into the camera coordinate frame consistent with nuscenes? After that, I need to go from the camera coordinate frame to the global coordinate frame, and I was hoping to modify box_to_sensor to do that. https://github.com/nutonomy/nuscenes-devkit/blob/864d0a207539e5383cd3eb26ebb1d7a44622f09d/python-sdk/nuscenes/eval/common/utils.py#L130

holger-motional commented 2 years ago

Hi. If your goal is to go from Yolo's 2d boxes to nuScenes 3d boxes, that is strictly speaking not possible. For each 2d box there are infinitely many possible 3d boxes. That said, there are all kinds of tricks you could use.

You can modify your Yolo to directly output 3d boxes (learned from the structure of the image).
You can look for the nearest lidar point and infer the depth from it. Unfortunately each of them is a small research project on its own and would go beyond what we can cover here.

abadithela commented 2 years ago

Hi Holger:

I'd like to use the second option.

Aside fron that, I'm using a pre-trained YoLo model (not trained on the nuscenes dataset), to detect cars and pedestrians. YoLo returns 2d bounding boxes in pixels and I'm trying to match that with the ground truth pixel values of bounding boxes from nuscenes. Basically, I'm trying to find the precision and recall (and other classification metrics) of the YoLo algorithm on the nuscenes dataset. I managed to get both in pixel coordinates on the same image size; however, obviously, these ground truth and prediction pixel boxes do not overlap perfectly. Further, it seems like some ground truth bounding boxes of nuscenes miss the object.

Thanks, Apurva

From: Holger Caesar @.> Sent: Monday, November 1, 2021 8:32 PM To: nutonomy/nuscenes-devkit @.> Cc: Badithela, Apurva @.>; Author @.> Subject: Re: [nutonomy/nuscenes-devkit] Matching YoLo detections to Nuscenes ground truth detection boxes (Issue #677)

Hi. If your goal is to go from Yolo's 2d boxes to nuScenes 3d boxes, that is strictly speaking not possible. For each 2d box there are infinitely many possible 3d boxes. That said, there are all kinds of tricks you could use.

You can modify your Yolo to directly output 3d boxes (learned from the structure of the image).
You can look for the nearest lidar point and infer the depth from it. Unfortunately each of them is a small research project on its own and would go beyond what we can cover here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/nutonomy/nuscenes-devkit/issues/677#issuecomment-957070369, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADGQTPWIURDYS3CU7HOWD7LUJ5LV5ANCNFSM5HES7VKA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

holger-motional commented 2 years ago

I don't think it makes sense for you to lift 2d to 3d. I suggest the following:

Run https://github.com/nutonomy/nuscenes-devkit/blob/master/python-sdk/nuscenes/scripts/export_2d_annotations_as_json.py to convert nuScenes ground-truth 3d boxes to 2d.
Run Yolo.
Compare the two sets of boxes, e.g. using mAP with IOU > 0.5.

abadithela commented 2 years ago

Got it, thanks. I ended up using functions from export_2d_annotations_as_json.py.

nutonomy / nuscenes-devkit

Matching YoLo detections to Nuscenes ground truth detection boxes #677