waymo-research / waymo-open-dataset

Waymo Open Dataset
https://www.waymo.com/open
Other
2.74k stars 619 forks source link

Problem about wSTQ metric in PVPS #575

Open leaf1170124460 opened 1 year ago

leaf1170124460 commented 1 year ago

Hi, @rezama, @google-admin, @alexzzhu, @patzm and @charlesq34. Thanks for your work on the Waymo Open Dataset.

In Waymo Open Dataset: Panoramic Video Panoptic Segmentation, wSTQ was proposed to evaluate PVPQ, but I can not find it in this repository or deeplab2. Is there any code implementation of wSTQ? If not, how can I get the weight of each pixel, w.r.t. its coverage by the number of cameras?

alexzzhu commented 1 year ago

Sorry for the late response, I was hoping to get these into our next release, but that looks unlikely at this point. In that case, we will have to bundle these in with our subsequent release sometime next year. In the meantime, you can generate the masks using the provided camera calibrations, following the method in the paper. Please let me know if you run into any issues.

leaf1170124460 commented 1 year ago

Thanks for your reply! But I'm still not quite sure how to get the masks of the overlapping regions with provided camera calibrations. Could you specify how to do that or provide some tutorials / sample codes?

alexzzhu commented 1 year ago

Hi sorry again, this got lost during the holiday season. A high level overview of the panorama generation can be found in our paper (https://arxiv.org/pdf/2206.07704.pdf), in Section 3.1 under Equirectangular Panorama. A short pseudo code to generate the masks via the panorama is as follows:

  1. Compute the pose of the virtual camera, defined as the geometric mean of the 5 cameras, using the provided extrinsic calibrations.
  2. Unproject every pixel in each image to the virtual camera coordinate frame, assuming some large fixed depth (we use 100m).
  3. Project each unprojected 3D pixel back into a panorama image at the virtual camera center. For this case, using a resolution of 1280x1920 should be sufficient. Whenever a pixel in the panorama image contains pixels from multiple cameras, we set the weight of the pixel in each camera to 1 / num_overlaps, where num_overlaps is the number of cameras that project to this pixel.

Each step is relatively involved and requires a bit of camera geometry. Please let me know which parts you'd like me to expand on. We're also working on releasing the masks and are looking into the possibility of releasing the code to generate them, and they should be out early this year.

sunnyHelen commented 1 year ago

Hi, I am very interested in generating a panoramic image from multi-camera images and thanks for giving us the workflow. For step 2, I think the unprojection operation needs depth values for each image. Do you mean using a fixed depth (e.g. 100 m) for every pixel to get the 3D point cloud? Will it cause a bad effect?

alexzzhu commented 1 year ago

Yes that's correct. It does cause some issues when objects are very close (i.e. < 10m), but in practice we found that this doesn't affect the metric very much. You can also visualize the virtual image using the above steps to see the impact of these errors.

As a note, there are definitely better panorama methods out there, especially if you perform feature matching. We chose this one only for simplicity.

sunnyHelen commented 1 year ago

Thanks a lot for your reply. Could you please give some details about the feature matching method?

sunnyHelen commented 1 year ago

panorama

You mentioned performing feature matching is better. Do you mean using the features like SIFT to conduct image stitching?

xcyan commented 1 year ago

@sunnyHelen Yes, SIFT feature can be used for image stitching but keep in mind that objects are moving.

I assume your question has been addressed thoroughly. Can we close this issue?

miangoleh commented 1 year ago

Can you provide the script you've used for panorama generation? I am projecting all images to camera space using intrinsics and then using the extrinsic to project them into the world plain. But using the fixed depth assumption I could not get reasonable results.

Lutyyyy commented 1 year ago

Hi sorry again, this got lost during the holiday season. A high level overview of the panorama generation can be found in our paper (https://arxiv.org/pdf/2206.07704.pdf), in Section 3.1 under Equirectangular Panorama. A short pseudo code to generate the masks via the panorama is as follows:

  1. Compute the pose of the virtual camera, defined as the geometric mean of the 5 cameras, using the provided extrinsic calibrations.
  2. Unproject every pixel in each image to the virtual camera coordinate frame, assuming some large fixed depth (we use 100m).
  3. Project each unprojected 3D pixel back into a panorama image at the virtual camera center. For this case, using a resolution of 1280x1920 should be sufficient. Whenever a pixel in the panorama image contains pixels from multiple cameras, we set the weight of the pixel in each camera to 1 / num_overlaps, where num_overlaps is the number of cameras that project to this pixel.

Each step is relatively involved and requires a bit of camera geometry. Please let me know which parts you'd like me to expand on. We're also working on releasing the masks and are looking into the possibility of releasing the code to generate them, and they should be out early this year.

@alexzzhu Hi, do you have any plans to release the 360-degree panorama generation code for the project now? I would be grateful for any code, even if it's still in development or not yet optimized.

alexzzhu commented 1 year ago

Unfortunately we don't have any short term plans to release this code. However, I'd be happy to help debug your implementation if you try to follow the steps above or use external libraries.