Incorrect instance labels in video panoptic segmentation

Hi, @rezama , @google-admin , @alexzzhu , @patzm and @charlesq34 . Thanks for your work on the Waymo Open Dataset.

But I found a problem when I generate VPS dataset. In validation video 188 (segment-8956556778987472864_3404_790_3424_790_with_camera_labels.tfrecord), some stuffs were labeled as instances, such as trees, sky, etc, which make the output pano masks incorrect. I visualize the area where instance labels > 0 as white to help you understand my problem.

overlap-b-255

This is original image.

20221116-181651

Hi there, thanks for pointing this out! We have noticed some issues with this sequence and have performed corrections on our side, which we expect to release as an update to the dataset later this or next week. To double check, do you have the frame id or timestamp for this particular frame?

Thanks for your reply! It seems that the whole video sequence has wrong instance labels. I selected these frames to generate pano masks, you can check them. [2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195]

Are you seeing this issue (instances for non-instance classes) for the entire sequence? Are you seeing it in other sequences?

Yes, the selected frames in this video all have this issue. For validation set of VPS, I only see this issue in the video 188. For training set, I am not sure if there are some videos have the same issue.

Hi guys, I want to continue this discussion, since I found there is a inconsistent instance ID across different frames. In the test below, I highlighted the object with instance ID = 1 under the Car class. We found that in the first two frames, the segmentation fell on the vehicle parked vertically on the right side and then fell on the vehicle parked in front. And I think the ideal instance ID or Tracking ID should be an independent ID for each object and there should be no new instance occupy old ID number. In experiment 2, I found that the Car type and other types of objects (such as Pedestrian) share IDs such as Car:1 Pedestrian:1 in different frames. ==Exp 1=== query_id = 1; query_class_1 = 2; query_class_2 = 9 waymo_open_dataset_v_1_4_1 segment-10017090168044687777_6380_000_6400_000_with_camera_labels.tfrecord 24 28 30 32 36

==Exp 2=== query_id = 1; query_class_1 = 2; query_class_2 = 9 waymo_open_dataset_v_1_4_1 segment-1231623110026745648_480_000_500_000_with_camera_labels.tfrecord

74 78

import numpy as np
import cv2
import matplotlib.pyplot as plt
import tensorflow.compat.v1 as tf
import tqdm
import math
from multiprocessing import Pool
from os.path import join, isdir
import argparse
from glob import glob
import sys
import numpy
numpy.set_printoptions(threshold=sys.maxsize)

from waymo_open_dataset.utils.frame_utils import parse_range_image_and_camera_projection
from waymo_open_dataset.utils import  frame_utils
from waymo_open_dataset import dataset_pb2 as open_dataset
from waymo_open_dataset import dataset_pb2
from waymo_open_dataset.utils import range_image_utils
from waymo_open_dataset.utils import transform_utils

from matplotlib import patches
from waymo_open_dataset import label_pb2
from waymo_open_dataset.camera.ops import py_camera_model_ops
from waymo_open_dataset.metrics.ops import py_metrics_ops
from waymo_open_dataset.metrics.python import config_util_py as config_util
from waymo_open_dataset.protos import breakdown_pb2
from waymo_open_dataset.protos import metrics_pb2
from waymo_open_dataset.protos import submission_pb2
from waymo_open_dataset.utils import box_utils

import itertools
import immutabledict

if not tf.executing_eagerly():
  tf.compat.v1.enable_eager_execution()

from waymo_open_dataset import dataset_pb2 as open_dataset
from waymo_open_dataset.utils import camera_segmentation_utils

tf.enable_eager_execution()

output_path = '/content/'
class_list =['Undefined','Ego_vehicle','Car','Truck','Bus','Other_large_vehicle','Bicycle',
             'Motorcycle','Trailer','Pedestrian','Cyclist','Motorcyclist','Bird','Ground_animal',
             'Construction_cone_pole','Pole','Pedestrian_object','Sign','Traffic_light','Building',
             'Road','Lane_marker','Road_marker','Sidewalk','Vegetation','Sky','Ground','Dynamic','Static']
class_color_list =[[0, 0, 0],[102, 102, 102],[0, 0, 142],[0, 0, 70],[0, 60, 100],[61, 133, 198],[119, 11, 32],
                   [0, 0, 230],[111, 168, 220],[220, 20, 60],[255, 0, 0],[180, 0, 0],[127, 96, 0],[91, 15, 0],
                   [230, 145, 56],[153, 153, 153],[234, 153, 153],[246, 178, 107],[250, 170, 30],[70, 70, 70],
                   [128, 64, 128],[234, 209, 220],[217, 210, 233],[244, 35, 232],[107, 142, 35],[70, 130, 180],[102, 102, 102],[102, 102, 102],[102, 102, 102]]

# Only data collected in specific locations will be converted
# If set None, this filter is disabled (all data will thus be converted)
# Available options: location_sf (main dataset)
selected_waymo_locations = None

# def convert_one(self, file_idx):
def convert_one(file_idx):

    # tfrecord_pathnames = sorted(glob(join(load_dir, '*.tfrecord')))
    # pathname = self.tfrecord_pathnames[file_idx]
    # pathname = tfrecord_pathnames[load_dir]

    pathname = file_idx
    dataset = tf.data.TFRecordDataset(pathname, compression_type='')

    # file name with extension
    file_name = os.path.basename(FILE_NAME)

    # file name without extension
    segment_name = os.path.splitext(file_name)[0]
    print(segment_name)

    # Avoid repeat object id
    frame_obj_id = []
    if output_path is not None:
      cur_det_file = output_path + ('%s.txt' % segment_name)
      if os.path.exists(cur_det_file):
        os.remove(cur_det_file)

    for frame_idx, data in enumerate(dataset):

        frame = open_dataset.Frame()
        frame.ParseFromString(bytearray(data.numpy()))
        if selected_waymo_locations is not None and frame.context.stats.location not in selected_waymo_locations:
            continue

        # # save images
        # self.save_image(frame, file_idx, frame_idx)

        # # parse calibration files
        # self.save_calib(frame, file_idx, frame_idx)

        # # parse point clouds
        # self.save_lidar(frame, file_idx, frame_idx)

        # # parse label files
        # self.save_label(frame, file_idx, frame_idx)

        # # parse pose files
        # self.save_pose(frame, file_idx, frame_idx)

        save_2D_semantic(cur_det_file, frame, file_idx, frame_idx, frame_obj_id)

def _pad_to_common_shape(label):
  return np.pad(label, [[1280 - label.shape[0], 0], [0, 0], [0, 0]])

# def save_2D_semantic(frame, file_idx, frame_idx):
def save_2D_semantic(cur_det_file,frame, file_idx, frame_idx, frame_obj_id):

    """ parse and save the front camera's instance-level segmentation images in png format
            :param frame: open dataset frame proto
            :param file_idx: the current file number
            :param frame_idx: the current frame number
            :return:
    """
    frames_with_seg = []
    sequence_id = None

    # Save frames which contain CameraSegmentationLabel messages. We assume that
    # if the first image has segmentation labels, all images in this frame will.
    if frame.images[0].camera_segmentation_label.panoptic_label:
      # print(frame.images[0].camera_segmentation_label)
      frames_with_seg.append(frame)

      # if sequence_id is None:
      #   sequence_id = frame.images[0].camera_segmentation_label.sequence_id
      # # Collect 3/5 frames for this demo. However, any number can be used in practice.
      # if frame.images[0].camera_segmentation_label.sequence_id != sequence_id or len(frames_with_seg) > 4:
      #   break

    camera_front_only = [open_dataset.CameraName.FRONT]

    segmentation_protos_ordered = []
    for frame in frames_with_seg:
      segmentation_proto_dict = {image.name : image.camera_segmentation_label for image in frame.images}
      segmentation_protos_ordered.append([segmentation_proto_dict[name] for name in camera_front_only])

      # The dataset provides tracking for instances between cameras and over time.
      # By setting remap_values=True, this function will remap the instance IDs in
      # each image so that instances for the same object will have the same ID between
      # different cameras and over time.
      segmentation_protos_flat = sum(segmentation_protos_ordered, [])
      panoptic_labels, is_tracked_masks, panoptic_label_divisor = camera_segmentation_utils.decode_multi_frame_panoptic_labels_from_protos(
          segmentation_protos_flat, remap_values=True
      )
      print('panoptic_labels:',len(panoptic_labels),'at frame', frame_idx+1)

      # We can further separate the semantic and instance labels from the panoptic
      # labels.
      NUM_CAMERA_FRAMES = 1
      semantic_labels_multiframe = []
      instance_labels_multiframe = []
      semantic_labels = []
      instance_labels = []

      for i in range(0, len(segmentation_protos_flat), NUM_CAMERA_FRAMES):
        semantic_labels = []
        instance_labels = []
        for j in range(NUM_CAMERA_FRAMES):
          semantic_label, instance_label = camera_segmentation_utils.decode_semantic_and_instance_labels_from_panoptic_label(
            panoptic_labels[i + j], panoptic_label_divisor)
          semantic_labels.append(semantic_label)
          instance_labels.append(instance_label)
        semantic_labels_multiframe.append(semantic_labels)
        instance_labels_multiframe.append(instance_labels)

        # Pad labels to a common size so that they can be concatenated.
        instance_labels = [[_pad_to_common_shape(label) for label in instance_labels] for instance_labels in instance_labels_multiframe]
        semantic_labels = [[_pad_to_common_shape(label) for label in semantic_labels] for semantic_labels in semantic_labels_multiframe]
        instance_labels = [np.concatenate(label, axis=1) for label in instance_labels]
        semantic_labels = [np.concatenate(label, axis=1) for label in semantic_labels]

        instance_label_concat = np.concatenate(instance_labels, axis=0)
        semantic_label_concat = np.concatenate(semantic_labels, axis=0)
        panoptic_label_rgb = camera_segmentation_utils.panoptic_label_to_rgb(
            semantic_label_concat, instance_label_concat)
        semantic_label_rgb = camera_segmentation_utils.semantic_label_to_rgb(
            semantic_label_concat)

      # plt.figure(figsize=(16, 15))
      # # plt.figure()
      # # plt.imshow(panoptic_label_rgb[750:800,:1920])
      # # plt.imshow(semantic_label_rgb[750:800,:1920])
      # plt.imshow(panoptic_label_rgb[650:700,:1920])
      # # plt.imshow(panoptic_label_rgb)
      # plt.grid(False)
      # plt.axis('off')
      # plt.show()

      plt.figure(figsize=(16, 15))
      # plt.figure()
      # plt.imshow(semantic_label_rgb[750:800,:1920])
      # plt.imshow(Image2_mask, cmap='jet', alpha=0.5) # interpolation='none'
      plt.imshow(tf.image.decode_jpeg(frame.images[0].image))

      query_id = 1

      # Car 2; Pedestrain 9
      query_class_1 = 2
      query_class_2 = 9

      # Find segmentation for instance ID
      mask_id = instance_label_concat.copy()
      mask_id = mask_id.reshape(mask_id.shape[0],mask_id.shape[1])
      print(mask_id.shape)
      mask_id_3d = np.stack((mask_id,mask_id,mask_id),axis=2) #3 channel mask
      mask_id_3d_mod = np.where(mask_id_3d==query_id, 1, 0)

      # Find class
      mask_class = semantic_label_concat.copy()
      mask_class = mask_class.reshape(mask_class.shape[0],mask_class.shape[1])
      print(mask_class.shape)
      mask_class_3d = np.stack((mask_class,mask_class,mask_class),axis=2) #3 channel mask
      mask_class_3d_mod = np.where((mask_class_3d==query_class_1) | (mask_class_3d==query_class_2), 1, 0)

      # new_panoptic_label_rgb = panoptic_label_rgb
      new_panoptic_label_rgb = panoptic_label_rgb * mask_id_3d_mod * mask_class_3d_mod
      # new_panoptic_label_rgb = panoptic_label_rgb * mask_class_3d_mod
      # print(mask_class_3d_mod[100,960:1920].T)
      # new_panoptic_label_rgb = np.ma.masked_where(np.ma.getmask(m), panoptic_label_rgb) # applies the mask of m on x
      plt.imshow(new_panoptic_label_rgb, alpha=0.3)
      # plt.imshow(panoptic_label_rgb, alpha=0.7)

      # print(panoptic_label_rgb.shape)
      # print([instance_label_concat==1].shape)
      plt.grid(False)
      plt.axis('off')
      plt.show()

      # print(panoptic_label_rgb[:,:1920].shape)
      # print(semantic_label_concat[800,960:1920].T)
      # print(instance_label_concat[700,960:1920].T)
      # print(instance_label_concat[700,960:1920].T)
      print('instance_label: ',list(set(sorted(instance_label_concat.reshape(-1).tolist())))[1:])
      print('semantic_label: ',list(set(sorted(semantic_label_concat.reshape(-1).tolist())))[1:])

      ## All objects!
      # frame_instance_id = list(set(sorted((instance_label_concat*mask_class_3d_mod).reshape(-1).tolist())))[1:]

      # Only car objects!
      frame_instance_id = list(set(sorted((instance_label_concat*mask_class_3d_mod).reshape(-1).tolist())))[1:]
      frame_instance_color = list(set(sorted(semantic_label_concat.reshape(-1).tolist())))[1:]
      # print(frame_instance_id)
      print(frame_obj_id)

      with open(cur_det_file, 'a') as f:
        if os.path.getsize(cur_det_file) == 0:
          print('Category(:id) r g b',file=f)
        # id_index_0 = np.where(instance_label_concat == id)[0][0]
        # id_index_1 = np.where(instance_label_concat == id)[1][0]
        # class_name = class_list[semantic_label_concat[id_index_0,id_index_1,0]]
        # r = panoptic_label_rgb[id_index_0,id_index_1][0]
        # g = panoptic_label_rgb[id_index_0,id_index_1][1]
        # b = panoptic_label_rgb[id_index_0,id_index_1][2]
        for id in frame_instance_id:
          id_index_0 = np.where(instance_label_concat == id)[0][0]
          id_index_1 = np.where(instance_label_concat == id)[1][0]
          class_name = class_list[semantic_label_concat[id_index_0,id_index_1,0]]
          r = panoptic_label_rgb[id_index_0,id_index_1][0]
          g = panoptic_label_rgb[id_index_0,id_index_1][1]
          b = panoptic_label_rgb[id_index_0,id_index_1][2]
          # print(id_index_0,id_index_1)
          print(class_list[semantic_label_concat[id_index_0,id_index_1,0]],':',id,' '.join(map(str, panoptic_label_rgb[id_index_0,id_index_1])))
          if id not in frame_obj_id:
            print('%s:%d %d %d %d'% (class_name, id, r, g, b), file=f)
            frame_obj_id.append(id)
          else:
            pass
          # frame_obj_id = list(set(frame_obj_id))
    # print(cur_det_file+" Created Successfully")

FILE_NAME = '/content/data/individual_files/training/segment-10017090168044687777_6380_000_6400_000_with_camera_labels.tfrecord'
# FILE_NAME = '/content/data/individual_files/training/segment-1231623110026745648_480_000_500_000_with_camera_labels.tfrecord'
convert_one(FILE_NAME)

I also found that there are some wrong labels, such as frame 36 insegment-10017090168044687777_6380_000_6400_000_with_camera_labels.tfrecord, notice there is a large area on the road is not being labeled.

Lastly, you claimed in your paper that the frame indices are {25, 50, 125, 150} for the four sets of five-frame sequences, but according to my test, the frame indices in datasets 1.4.1 are {25, 75, 125, 175}.

I would really appreciate it if you could help me confirm the first point! In addition, I found that the instance ID is relatively stable in highway driving scenes, probably because there will not be many vehicles jumping in and out of the screen.

-- Jay

Hi Jay,

Thanks for the in depth analysis! The instance tracking is done by matching the various groundtruth bounding boxes in the dataset with the instance labels in each frame, so I can't guarantee that there aren't any id switches. However, your first example looks a little egregious and we'll definitely look into it. One question here, are the two vehicles different classes? In that case you should not associate the two together (see the second example response below).

For your second example, are you seeing that the vehicle ID maps to the pedestrian ID? Since they have different classes, they shouldn't be associated with each other (instance ids may be the same for objects with different classes, the panoptic label is the main differentiator here).

For your third issue, there may be some unknown regions in non-obvious places. We're aware of this, but we think it should be low impact as that section is essentially unlabeled. Is this causing an issue in your pipeline?

Finally, thanks for pointing out the frame indices issue, we plan to update the paper soon with a lot of updates. We'll also release a list of all context_name,timestamp pairs with groundtruth.

Hi Alex,

For the first case, I visualize the id==1 for the CAR class only, so I believe two car objects shared one ID instead of one Car and one Truck. Also, since it's the first segment in the training folder, I believe there are more cases like this since your method is an automatic pipeline to combine three domains (human, LiDAR, and camera).

For the 2nd, my point is the vehicle ID==1 shared with pedestrian ID==1. I don't know waymo's labeling policy for this panoptic ID. My project aims to convert Waymo labels into virtual KITTI style. Ideally, any instance should have an unique id number meanwhile its color shouldn't change, otherwise I can't get the trajectory (x3d, y3d in KITTI style) of the object according to the id. I don't know if what I understand is the widely accepted instance-level segmentation labeling standard. (or maybe my code is wrong?) virtual KITTI samples frame 00000: 00000 frame 00005: 00005

Terrain 210 0 200
Sky 90 200 255
Tree 0 199 0
Vegetation 90 240 0
Building 140 140 140
Road 100 60 100
GuardRail 255 100 255
TrafficSign 255 255 0
TrafficLight 200 200 0
Pole 255 130 0
Misc 80 80 80
Truck 160 60 60
Car:0 200 200 200
Car:1 232 227 239
Car:2 215 203 229
Car:3 247 230 218
Car:4 230 207 208
Car:5 212 234 247
Car:6 244 210 237
Car:7 227 237 226
Car:8 209 214 215
Car:9 241 241 205
Car:10 224 217 244
Car:11 206 244 234
Car:12 239 221 223
Car:13 221 248 212
Car:14 203 224 202
Car:16 218 228 231
Car:17 200 205 220
Car:18 233 231 210
Car:19 215 208 249
Car:20 248 235 238
Car:21 230 212 228
Car:22 212 238 217
Car:23 245 215 207
Car:24 227 242 246
Car:25 210 218 236
Car:26 242 245 225
Car:27 224 222 214
Car:28 207 249 204
Car:29 239 225 243
Car:30 221 202 233
Car:31 204 229 222
Car:32 236 206 211
Car:33 219 232 201
Car:34 201 209 240
Car:35 233 236 230
Car:36 216 213 219
Car:37 248 239 209
Car:38 230 216 248
Car:39 213 243 237
Car:40 245 220 227
Car:41 228 246 216
Car:42 210 223 206
Car:43 242 200 245
Car:46 239 230 213
Car:47 222 207 203
Car:48 204 234 242
Car:49 237 210 232
Car:51 214 210 230
Car:52 240 200 230
Car:53 217 239 231
Car:54 244 229 231
Car:55 221 218 232
Car:56 247 208 232
Car:57 224 247 233
Car:58 201 236 234
Car:59 228 226 234
Car:60 204 215 235
Car:61 231 205 235
Car:62 208 244 236
Car:63 235 233 237
Car:64 211 223 237
Car:65 238 212 238
Car:66 215 202 238
Car:67 242 241 239
Car:68 218 231 239
Car:69 245 220 240
Car:70 222 209 241
Car:71 249 249 241
Car:72 225 238 242
Car:73 202 228 242
Car:74 229 217 243
Car:75 206 206 244
Car:76 232 246 244
Car:77 209 235 245
Car:78 236 225 245
Car:79 212 214 246
Car:80 239 204 246
Car:81 216 243 247
Car:83 219 222 248
Car:84 246 211 249
Car:85 223 201 249
Car:86 200 240 200
Car:88 203 219 201
Van:89 230 208 202
Van:90 207 248 202
Car:91 233 237 203
Van:92 210 227 203
Car:93 237 216 204
Car:94 214 205 205
Car:95 240 245 205
Car:96 217 234 206
Car:97 244 224 206

The other problems are trivial ones, which I'm sure you will solve.

Hmm I can't seem to reproduce the error you're seeing for the first example, let me take a closer look in a bit and get back to you.

The way that our panoptic labels are arranged is that every instance has a unique panoptic label, which is semantic_label * panoptic_label_divisor + instance_label. Instance labels may be the same for objects of different classes, but the semantic label will differentiate them. If you need instance labels that are unique across classes, you can use the panoptic labels directly for this purpose. Not sure if this answers your question.

You made my day! So maybe unique panoptic label = semantic_label * panoptic_label_divisor + instance_label is equivalent to tracking ID in virtual KITTI. So the labels for this image is semantic_label: [2, 15, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28] panoptic_labels: [28000, 24000, 0, 20000, 27000, 23000, 25000, 17000, 21000, 15000, 26000, 2001, 2002, 2003, 2004, 22000, 19000] instance_label: [1, 2, 3, 4]

I'll post my visualization result later to if my above post is wrong.

The panoptic_label is unique within a single frame. I think the panoptic_labels that I obtained above such as 2001, 2002, 2003, 2004 are still frame independent, which means the objects with the same id may not belong to the same object. (Consider front camera only) I wish the id should be a unique ID across all frames within the same sequence. Therefore, should I use instance_id_to_global_id_mapping to extract a real unique ID?

A mapping between each panoptic label with an instance_id and a globally unique ID 
across all frames within the same sequence. This can be used to match instances 
across cameras and over time. i.e. instances belonging to the same object will map 
to the same global ID across all frames in the same sequence.

and how?

Yes the instance_id_to_global_id_mapping will allow you to map to a globally unique ID. To do this in your own code, you can just do something like: panoptic_label[panoptic_label == instance_id_to_global_mapping.local_instance_id] = instance_id_to_global_mapping.global_instance_id. Note that we also provide an is_tracked bool for each mapping. If this is false, the global id will not be tracked correctly.

We also provide a util to do this decoding, although in your use case you will need to pass all of the frames for the sequence in at once: https://github.com/waymo-research/waymo-open-dataset/blob/bae19fa0a36664da18b691349955b95b29402713/waymo_open_dataset/utils/camera_segmentation_utils.py#L183 I plan to refactor this to make it simpler in an upcoming release.

Hi Alex, I've used this mapped id to visualize them correctly. I'll also check the is_tacked bool if it's necessary.

Thank you!

@jayhsu0627 Hey, I believe Alex has addressed your question. Can we close this issue?

@xcyan Sure, thanks!

waymo-research / waymo-open-dataset

Incorrect instance labels in video panoptic segmentation #570