vincentfung13 / MINE

Code and models for our ICCV 2021 paper "MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis"
MIT License
406 stars 43 forks source link

KITTI split and LPIPS computation #4

Closed zzyunzhi closed 2 years ago

zzyunzhi commented 2 years ago

Hi,

Thank you for the fantastic work! I have two small questions regarding model evaluation.

  1. KITTI raw data split Section 4.1 mentions that there are 20 city sequences from KITTI Raw used for training and 4 sequences used for test. However, there are 28 city sequences in KITTI Raw in total. Do you use the rest of 4 sequences anywhere in the pipeline? Are the 20 training sequences and 4 test sequences exactly the same as used in Tulsiani 2018, as implemented here?

  2. LPIPS computation You computed LPIPS here. According the dataloader implemented here, your inputs to LPIPS are in range [0, 1] while LPIPS expects inputs in range [-1, 1] as mentioned in their doc. Am I missing anything here, or the input should indeed be normalized to have the correct LPIPS score?

Thank you in advance for the time.

vincentfung13 commented 2 years ago

Hi,

Thank you for your interest in our work!

  1. Yes, the sequences are exactly the same as implemented in Tulsiani et. al. 2018 and Tucker et. al. 2020 (as mentioned in their paper) so that the numbers are comparable.
  2. Yes you are correct, this does indeed look like a bug on our side. We will try to fix this and update the numbers in the paper. Thank you for pointing this out!

Zijian

zzyunzhi commented 2 years ago

Hi Zijian,

Appreciate the prompt reply! For question 1., did you use the rest 4 sequences for validation, or did you just leave these 4 sequences unused?

Additionally, is the code for ImageNet pre-training available, and is there some detailed description of the pre-training procedure?

vincentfung13 commented 2 years ago

Not sure if I understand question 1 correctly - technically we didn't have any validation set, since Tulsiani et. al. 2018 and Tucker et. al. 2020 also used this split for training and testing. We train our models til the end and report the results of the last checkpoint.

For ImageNet pre-training: since we use a similar architecture as in Monodepth2, the encoder is a ResNet-50 with pre-trained weights. We did not pre-train the decoder.

Hope this helps.

Zijian

zzyunzhi commented 2 years ago

Tulsiani et.al. 2018 reported in Section 4.2 that they used 30 sequences (22 training + 4 validation + 4 testing) from the city category. Tucker et.al. 2020 Section 4.4 also mentioned that they used 22 training + 4 testing.

Based on the public code, however, it seems to me that there are only 28 sequences in total (20 training + 4 validation + 4 testing). Am I looking at the same split that you used in the paper? In Section 4.1 of your paper you mentioned that there are 20 sequences for training and 4 for evaluation. If I am understanding correctly, it means that the 4 validation sequences from the codebase of Tulsiani et.al. 2018 are simply not used?

Thanks a lot!

vincentfung13 commented 2 years ago

Hi, I double checked our data pre-processing codes, and yes you are correct, we did not use the validation set and we used the exact same code as you mentioned to generate our splits. We will try to release our testing pipelines for the other datasets later on, sorry for the confusion.

zzyunzhi commented 2 years ago

It's very helpful, thank you Zijian!