tobias-kirschstein / gghead

66 stars 2 forks source link

Frozen Weights Prior to GGHead Training, Condition Spaces, and Inversion #5

Open Ryan-Vereque opened 1 week ago

Ryan-Vereque commented 1 week ago

Thank you for the fascinating paper. Looking forward to seeing repo contributions regarding training.

Question 1) Do any of the trained GGHead models in the paper (or pretained GGHead models linked in this repo) contain sub-modules of either frozen weights or weights that had been initialized with that of an existing pre-trained models of a different paper/architecture ?

I assume the only location where this could even possibly be the case is somewhere in between the latent-space/z and planes/gaussian-features. And if so then perhaps any weights from EG3D (i.e eg3d.training.networks_stylegan2.MappingNetwork) or any weights of existing StyleGan2 nets ?

I ask because I'm trying to ascertain to what extent

Question 2) If no to Q1, what are the channel counts (or tensor shapes) for z and c and w/w+ for the models aforementioned GGHead paper/pretrained models ?

Question 3) If no to Q1, what are the specific conditions (semantically) associated with the condition tensors c used during training ? Is it included/derived from the AFHQ/FFHQ datasets perhaps ?

Question 4) Do you know if PTI (Pivotal Tuning) or other such custom methods for recovering z/c/w/w+ (inversion) should be automatically possible given a pretrained GGHead model and given some/all planes/gaussian-UV-Maps (or final image output) ? Alternatively, can I assume that the universally applicable method of freezing the network and simply allowing gradients to flow back into the space of z/c/w/w+ itself will still be possible ? (even if not perform very well and get stuck at local minima)

Question 5) Have the authors attempted or plan to attempt inversion similar to Q4 starting with the final image output, but instead of inferring all the way to back to z/c/w/w+ space, just trying to merely infer back to Gaussian Map/UV space ? (which would be somewhat akin to traditional Gaussian splatting I think). Perhaps doing so while either pose-related Gaussian attributes (position, scale, etc) or appearance-related Gaussian attributes (color, opacity) remain affixed to a known good approximation for the image in question ?