jkulhanek / viewformer

ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers
MIT License
233 stars 15 forks source link

Difference in results with CO3Dv1 models and CO3Dv2 models #3

Closed zhizdev closed 1 year ago

zhizdev commented 1 year ago

I am trying to generate some visuals with ViewFormer on CO3Dv2 and I would like to double check a few things.

The changes that I know from v1 to v2 are: 1) the input image is now 4 channels, with the first 3 being masked rgb with black background, and the last channel being a binary mask.

However, I am getting very different results using the same code but with different models.

The first gif is rendered using co3d-10cat-noloc-transformer-tf while the second gif is rendered using co3dv2-all-noloc-transformer-tf

The first gif looks reasonable but the second gif looks suspicious.

It would be great if you can provide some pointers for me to debug this. Thank you so much!

hydrant_000_ 12, 31, 25 _v1 hydrant_000_ 12, 31, 25 _v2

jkulhanek commented 1 year ago

V2 also uses different cropping. You cannot use V2 models on V1 and vice versa. Also, make sure to use the correct codebook model.

zhizdev commented 1 year ago

Got it. We noticed the different cropping as well. Thanks!