Incorrect instance labels in video panoptic segmentation #570

leaf1170124460 commented 1 year ago

Hi, @rezama , @google-admin , @alexzzhu , @patzm and @charlesq34 . Thanks for your work on the Waymo Open Dataset.

But I found a problem when I generate VPS dataset. In validation video 188 (segment-8956556778987472864_3404_790_3424_790_with_camera_labels.tfrecord), some stuffs were labeled as instances, such as trees, sky, etc, which make the output pano masks incorrect. I visualize the area where instance labels > 0 as white to help you understand my problem.


This is original image.


alexzzhu commented 1 year ago

Hi there, thanks for pointing this out! We have noticed some issues with this sequence and have performed corrections on our side, which we expect to release as an update to the dataset later this or next week. To double check, do you have the frame id or timestamp for this particular frame?

leaf1170124460 commented 1 year ago

Thanks for your reply! It seems that the whole video sequence has wrong instance labels. I selected these frames to generate pano masks, you can check them. [2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195]

alexzzhu commented 1 year ago

Are you seeing this issue (instances for non-instance classes) for the entire sequence? Are you seeing it in other sequences?

leaf1170124460 commented 1 year ago

Yes, the selected frames in this video all have this issue. For validation set of VPS, I only see this issue in the video 188. For training set, I am not sure if there are some videos have the same issue.

jayhsu0627 commented 1 year ago

Hi guys, I want to continue this discussion, since I found there is a inconsistent instance ID across different frames. In the test below, I highlighted the object with instance ID = 1 under the Car class. We found that in the first two frames, the segmentation fell on the vehicle parked vertically on the right side and then fell on the vehicle parked in front. And I think the ideal instance ID or Tracking ID should be an independent ID for each object and there should be no new instance occupy old ID number. In experiment 2, I found that the Car type and other types of objects (such as Pedestrian) share IDs such as Car:1 Pedestrian:1 in different frames. ==Exp 1=== query_id = 1; query_class_1 = 2; query_class_2 = 9 waymo_open_dataset_v_1_4_1 segment-10017090168044687777_6380_000_6400_000_with_camera_labels.tfrecord 24 1 28 2 303 323 364

==Exp 2=== query_id = 1; query_class_1 = 2; query_class_2 = 9 waymo_open_dataset_v_1_4_1 segment-1231623110026745648_480_000_500_000_with_camera_labels.tfrecord

7411 7822

I also found that there are some wrong labels, such as frame 36 in segment-10017090168044687777_6380_000_6400_000_with_camera_labels.tfrecord, notice there is a large area on the road is not being labeled.


Lastly, you claimed in your paper that the frame indices are {25, 50, 125, 150} for the four sets of five-frame sequences, but according to my test, the frame indices in datasets 1.4.1 are {25, 75, 125, 175}.

I would really appreciate it if you could help me confirm the first point! In addition, I found that the instance ID is relatively stable in highway driving scenes, probably because there will not be many vehicles jumping in and out of the screen.

-- Jay

alexzzhu commented 1 year ago

Hi Jay,

Thanks for the in depth analysis! The instance tracking is done by matching the various groundtruth bounding boxes in the dataset with the instance labels in each frame, so I can't guarantee that there aren't any id switches. However, your first example looks a little egregious and we'll definitely look into it. One question here, are the two vehicles different classes? In that case you should not associate the two together (see the second example response below).

For your second example, are you seeing that the vehicle ID maps to the pedestrian ID? Since they have different classes, they shouldn't be associated with each other (instance ids may be the same for objects with different classes, the panoptic label is the main differentiator here).

For your third issue, there may be some unknown regions in non-obvious places. We're aware of this, but we think it should be low impact as that section is essentially unlabeled. Is this causing an issue in your pipeline?

Finally, thanks for pointing out the frame indices issue, we plan to update the paper soon with a lot of updates. We'll also release a list of all context_name,timestamp pairs with groundtruth.

jayhsu0627 commented 1 year ago

Hi Alex,

For the first case, I visualize the id==1 for the CAR class only, so I believe two car objects shared one ID instead of one Car and one Truck. Also, since it's the first segment in the training folder, I believe there are more cases like this since your method is an automatic pipeline to combine three domains (human, LiDAR, and camera).

For the 2nd, my point is the vehicle ID==1 shared with pedestrian ID==1. I don't know waymo's labeling policy for this panoptic ID. My project aims to convert Waymo labels into virtual KITTI style. Ideally, any instance should have an unique id number meanwhile its color shouldn't change, otherwise I can't get the trajectory (x3d, y3d in KITTI style) of the object according to the id. I don't know if what I understand is the widely accepted instance-level segmentation labeling standard. (or maybe my code is wrong?) virtual KITTI samples frame 00000: 00000 frame 00005: 00005

The other problems are trivial ones, which I'm sure you will solve.

alexzzhu commented 1 year ago

Hmm I can't seem to reproduce the error you're seeing for the first example, let me take a closer look in a bit and get back to you.

The way that our panoptic labels are arranged is that every instance has a unique panoptic label, which is semantic_label * panoptic_label_divisor + instance_label. Instance labels may be the same for objects of different classes, but the semantic label will differentiate them. If you need instance labels that are unique across classes, you can use the panoptic labels directly for this purpose. Not sure if this answers your question.

jayhsu0627 commented 1 year ago

You made my day! So maybe unique panoptic label = semantic_label * panoptic_label_divisor + instance_label is equivalent to tracking ID in virtual KITTI. So the labels for this image is image semantic_label: [2, 15, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28] panoptic_labels: [28000, 24000, 0, 20000, 27000, 23000, 25000, 17000, 21000, 15000, 26000, 2001, 2002, 2003, 2004, 22000, 19000] instance_label: [1, 2, 3, 4]

I'll post my visualization result later to if my above post is wrong.

jayhsu0627 commented 1 year ago

The panoptic_label is unique within a single frame. I think the panoptic_labels that I obtained above such as 2001, 2002, 2003, 2004 are still frame independent, which means the objects with the same id may not belong to the same object. (Consider front camera only) I wish the id should be a unique ID across all frames within the same sequence. Therefore, should I use instance_id_to_global_id_mapping to extract a real unique ID?

A mapping between each panoptic label with an instance_id and a globally unique ID 
across all frames within the same sequence. This can be used to match instances 
across cameras and over time. i.e. instances belonging to the same object will map 
to the same global ID across all frames in the same sequence.

and how?

alexzzhu commented 1 year ago

Yes the instance_id_to_global_id_mapping will allow you to map to a globally unique ID. To do this in your own code, you can just do something like: panoptic_label[panoptic_label == instance_id_to_global_mapping.local_instance_id] = instance_id_to_global_mapping.global_instance_id. Note that we also provide an is_tracked bool for each mapping. If this is false, the global id will not be tracked correctly.

We also provide a util to do this decoding, although in your use case you will need to pass all of the frames for the sequence in at once: https://github.com/waymo-research/waymo-open-dataset/blob/bae19fa0a36664da18b691349955b95b29402713/waymo_open_dataset/utils/camera_segmentation_utils.py#L183 I plan to refactor this to make it simpler in an upcoming release.

jayhsu0627 commented 1 year ago

Hi Alex, I've used this mapped id to visualize them correctly. I'll also check the is_tacked bool if it's necessary.

image image image

Thank you!

xcyan commented 1 year ago

@jayhsu0627 Hey, I believe Alex has addressed your question. Can we close this issue?

jayhsu0627 commented 1 year ago

@xcyan Sure, thanks!