Get access to 2d bounding boxes with respect to camera

mapillary / metropolis_sdk

A collection of code examples to help users get started with the Mapillary Metropolis dataset

Other

21 stars 4 forks source link

Get access to 2d bounding boxes with respect to camera #3

Open gianscarpe opened 2 years ago

gianscarpe commented 2 years ago

Hi, thanks for your amazing work! I have a potentially silly question to ask. I need to access the 2D bounding-boxes for each camera (front, left, right, back) but I could only find the bounding-boxes for the equirectangular image. I access the 2d boxes with:

token = "'tr1thGb4-HK8yPOzSZFHQQ-cam-front"
boxes = met.get_sample_data(token, get_all_visible_boxes=False)[2]

The result is a list of EquiBox2d. Do they refer to the projection of the bounding-boxes onto a specific camera (e.g., the frontal one)? My second questions regards the content of the points attribute. It's a numpy list of shape (80x2). Are these all the points of the bounding box (and in this case, the four corners are the actual coordinates of the bounding box)? Thanks in advance! :)

ducksoup commented 2 years ago

Hi @gianscarpe , you are already on the right track!

2D bounding boxes in Metropolis are annotated on the equirectangular images, which means that their edges map to curves, and not to straight lines, when seen from the perspective images. This is represented in the SDK by the EquiBox2d class. EquiBox2d is a discretized representation of one of these "deformed" boxes, where the points attribute contains its boundary expressed as a polygon. If you want to get a standard, axis-aligned box out of this, the easiest way would be to compute the bounding box of the points.

Note: this approach will give you bounding boxes that are not tight around the objects. Unfortunately, there's no way to obtain tight boxes on the perspective images given tight boxes on the equirectangular images without human re-annotation.

gianscarpe commented 2 years ago

Hi @ducksoup, thanks for your answer, it solved my problem! I have a couple of questions more. I noticed that the SDK provides the 2d bounding boxes only for a handful of classes, while the large majority of the categories are missing (e.g., buildings, vegetation, sidewalks). Is it intended to be? Another question regards the depth. I noticed that some objects (e.g., cars and pedestrians) are missing from the lidar data, I guess because of SFM estimation. Is it correct? I attach a couple of examples, thank you for your time! :)

ducksoup commented 2 years ago

To answer your questions:

Bounding boxes are provided only for "things", i.e. countable objects. These are the categories that were annotated by human annotators. Everything else ("stuff" classes, panoptic segmentations) are machine generated and do not include bounding boxes.
The depth images are generated using SfM, so they only capture the static, consistently 3d-reconstructable part of the scene. Note that annotated moving objects such as cars and pedestrians are explicitly excluded from the reconstruction process.