Closed FanLu97 closed 1 year ago
Hi, The camera poses are randomly sampled from the training data distribution. Since FID measures the similarity to the training distribution, we follow common practice for that.
Thanks for your quick reply! Got it! Sorry to bother your again, you mentioned that "we discard scenes where the car is turning either left of right". Is there any special reasons for this operation? Or have you observed limitations of the method under such conditions?
I have generated data for KITTI-360 dataset using https://github.com/QhelDIV/kitti360_renderer (BTW, it contains more than 70k images, I think the reason may be that you filtered turning conditions?).
However, when I run generate.sh
using the data and the provided pretrained weights to generate results, the results are nearly blank like below:
Image:
Depth:
Have I missed anything?
Thanks in advance!
Hi,
Generally our method should also handle cars turning left and right, but it makes the task much easier to not handle it. It might need some smarter camera sampling to handle such cases. Hence, we filtered them out with the strict filter. This leads to less than 70k images, yes.
About the result: How did you preprocess the dataset? Can you print the input data you are using for this specific sample, so the boxes.npz basically? You are using the provided kitti checkpoint?
Thanks for the reply!
I can now successfully generate the results using your provided pretrained weights with minor changes. Specifically, previously I used the camera_coords and target_coords generated by https://github.com/QhelDIV/kitti360_renderer and got blank results. However, I found the coords may not follow the camera convention in EG3D (x-right, y-down, z-forwards). Thus I change the coords by adding following codes in generator.py
(after line 383) and it worked now.
camera_coords_ = camera_coords.clone()
target_coords_ = target_coords.clone()
camera_coords_[:,1] = -camera_coords[:,2]
camera_coords_[:,2] = -camera_coords[:,1]
target_coords_[:,1] = -target_coords[:,2]
target_coords_[:,2] = -target_coords[:,1]
camera_coords = camera_coords_.clone()
target_coords = target_coords_.clone()
Hi, thanks for the great work. I have some questions about the evaluation on KITTI-360 dataset. You mentioned that you generated 37691 scenes and use the maximum 37691 images for evaluation. I wonder how the rendering camera poses are selected, e.g., random selection or just use the center pose of the scene?
Look forward to your reply! Thanks!