Question about evaluation on KITTI-360 dataset.

sherwinbahmani / cc3d

CC3D: Layout-Conditioned Generation of Compositional 3D Scenes

https://sherwinbahmani.github.io/cc3d

96 stars 3 forks source link

Question about evaluation on KITTI-360 dataset. #8

Closed FanLu97 closed 1 year ago

FanLu97 commented 1 year ago

Hi, thanks for the great work. I have some questions about the evaluation on KITTI-360 dataset. You mentioned that you generated 37691 scenes and use the maximum 37691 images for evaluation. I wonder how the rendering camera poses are selected, e.g., random selection or just use the center pose of the scene?

Look forward to your reply! Thanks!

sherwinbahmani commented 1 year ago

Hi, The camera poses are randomly sampled from the training data distribution. Since FID measures the similarity to the training distribution, we follow common practice for that.

FanLu97 commented 1 year ago

Thanks for your quick reply! Got it! Sorry to bother your again, you mentioned that "we discard scenes where the car is turning either left of right". Is there any special reasons for this operation? Or have you observed limitations of the method under such conditions?

FanLu97 commented 1 year ago

I have generated data for KITTI-360 dataset using https://github.com/QhelDIV/kitti360_renderer (BTW, it contains more than 70k images, I think the reason may be that you filtered turning conditions?). However, when I run generate.sh using the data and the provided pretrained weights to generate results, the results are nearly blank like below: Image: 0000 Depth: 0000 Have I missed anything?

Thanks in advance!

sherwinbahmani commented 1 year ago

Hi,

Generally our method should also handle cars turning left and right, but it makes the task much easier to not handle it. It might need some smarter camera sampling to handle such cases. Hence, we filtered them out with the strict filter. This leads to less than 70k images, yes.

About the result: How did you preprocess the dataset? Can you print the input data you are using for this specific sample, so the boxes.npz basically? You are using the provided kitti checkpoint?

FanLu97 commented 1 year ago

Thanks for the reply! I can now successfully generate the results using your provided pretrained weights with minor changes. Specifically, previously I used the camera_coords and target_coords generated by https://github.com/QhelDIV/kitti360_renderer and got blank results. However, I found the coords may not follow the camera convention in EG3D (x-right, y-down, z-forwards). Thus I change the coords by adding following codes in generator.py (after line 383) and it worked now.

camera_coords_ = camera_coords.clone()
target_coords_ = target_coords.clone()
camera_coords_[:,1] = -camera_coords[:,2]
camera_coords_[:,2] = -camera_coords[:,1]
target_coords_[:,1] = -target_coords[:,2]
target_coords_[:,2] = -target_coords[:,1]
camera_coords = camera_coords_.clone()
target_coords = target_coords_.clone()