yuval-alaluf / SAM

Official Implementation for "Only a Matter of Style: Age Transformation Using a Style-Based Regression Model" (SIGGRAPH 2021) https://arxiv.org/abs/2102.02754
https://yuval-alaluf.github.io/SAM/
MIT License
632 stars 151 forks source link

Questions about target_ages. #4

Closed datbu178 closed 3 years ago

datbu178 commented 3 years ago

First of all I want to thank you for the great project. The results were very impressive! However, when I read the code I felt that something was not like what was described in the paper. I hope you can explain it to me. Thank you!

  1. In the following line of code: https://github.com/yuval-alaluf/SAM/blob/fb6699845bd50e9b6bf8520112c6a746456128f4/training/coach_aging.py#L266 You calculate the cosine weight based on target_ages. But I think cosine weight should be calculated based on abs(source ages - target_ages). Is that correct?

  2. In the following line of code: https://github.com/yuval-alaluf/SAM/blob/fb6699845bd50e9b6bf8520112c6a746456128f4/training/coach_aging.py#L114 y_hat_inverse is generated by concatenating y_hat_clone and the age that the age predictor predicts for y_hat_clone. However, according to paper, y_hat_inverse should be made by concatenating y_hat_clone and source ages (of the original image). Is that correct?

yuval-alaluf commented 3 years ago

Hi @datbu178 , It seems like you're correct on both points. When I refactored the code before publishing I guess I missed a few small points. Thanks for pointing this out!
I'll make the fixes as soon as possible.

yuval-alaluf commented 3 years ago

Hi @datbu178 , I just pushed an update with the two points you posted above so I hope the issues are now fixed. Please see https://github.com/yuval-alaluf/SAM/commit/30ce1af6ff402e924277d205cb6aefdd6eead6db If you still think there is still an issue please feel free to let me know. Thanks again for bringing this to my attention!

datbu178 commented 3 years ago

Dear @yuval-alaluf , Thank you for the answer. Yes, I think two issues are now fixed!

I just have another question in the following line of code: https://github.com/yuval-alaluf/SAM/blob/30ce1af6ff402e924277d205cb6aefdd6eead6db/models/psp.py#L72 The input of self.pretrained_encoder() is x[:,: -1,:,:]. However, I think the input of self.pretrained_encoder() should be the same as the input of self.encode(). So it should be x, not x[:,: -1,:,:]. Am I mistaken? Thanks.

yuval-alaluf commented 3 years ago

The input of self.pretrained_encoder() is x[:,: -1,:,:]. However, I think the input of self.pretrained_encoder() should be the same as the input of self.encode(). So it should be x, not x[:,: -1,:,:]. Am I mistaken? Thanks.

This part is a bit tricky.
So in the following line, https://github.com/yuval-alaluf/SAM/blob/30ce1af6ff402e924277d205cb6aefdd6eead6db/models/psp.py#L65 we pass the input x into our SAM encoder. Here, x is the 4-channel input (the RGB image and the constant channel representing the target age). In the line you linked, we actually want to pass only the RGB channels to the pre-trained pSp encoder (in order to extract the input's latent code). Therefore, I take only the first three channels if x when calling self.pretrained_encoder(...). Does that make sense?

datbu178 commented 3 years ago

Oh, I had a mistake. I understand now. Thank you very much for explaining it!