Mapping network input - Githubissues

markkim1115 commented 1 year ago

Hello, thanks for the good work.

You used eg3d based triplane feature grid. From the public eg3d code repo, I found there is 2 inputs for mapping network. one is z, another is conditional variable c ( in the eg3d, condition is 25-dimensional vector that represents camera parameters).

I wonder that in your model, you feed just global averaged 1D feature vector to the mapping network alone, or used other design(e.g. z is random vector and 1D feature vector is conditional variable)?

How did you applied LPIPS loss on the rendering outputs? Do you render image patches during training?

skhu101 commented 1 year ago

Hi, thanks for your interest in our work.

we use a pre-trained Resnet18 Backbone to extract 512-dimensional vector as z and do not use c in our case.
we directly render the whole image by utilizing human prior and then apply LPIPS on it. Based on previous experience on NeRF training, if you render image patches, similar performance should be achieved.

markkim1115 commented 1 year ago

Oh, i see. But rendering whole image may require large GPU memory. What GPU did you use to train the model?

markkim1115 commented 1 year ago

I found the issue about the GPU. Thanks. Closing the issue.

skhu101 / SHERF

Mapping network input #6