Closed datbu178 closed 3 years ago
Hi @datbu178 ,
It seems like you're correct on both points. When I refactored the code before publishing I guess I missed a few small points. Thanks for pointing this out!
I'll make the fixes as soon as possible.
Hi @datbu178 , I just pushed an update with the two points you posted above so I hope the issues are now fixed. Please see https://github.com/yuval-alaluf/SAM/commit/30ce1af6ff402e924277d205cb6aefdd6eead6db If you still think there is still an issue please feel free to let me know. Thanks again for bringing this to my attention!
Dear @yuval-alaluf , Thank you for the answer. Yes, I think two issues are now fixed!
I just have another question in the following line of code: https://github.com/yuval-alaluf/SAM/blob/30ce1af6ff402e924277d205cb6aefdd6eead6db/models/psp.py#L72 The input of self.pretrained_encoder() is x[:,: -1,:,:]. However, I think the input of self.pretrained_encoder() should be the same as the input of self.encode(). So it should be x, not x[:,: -1,:,:]. Am I mistaken? Thanks.
The input of self.pretrained_encoder() is x[:,: -1,:,:]. However, I think the input of self.pretrained_encoder() should be the same as the input of self.encode(). So it should be x, not x[:,: -1,:,:]. Am I mistaken? Thanks.
This part is a bit tricky.
So in the following line,
https://github.com/yuval-alaluf/SAM/blob/30ce1af6ff402e924277d205cb6aefdd6eead6db/models/psp.py#L65
we pass the input x
into our SAM encoder. Here, x
is the 4-channel input (the RGB image and the constant channel representing the target age).
In the line you linked, we actually want to pass only the RGB channels to the pre-trained pSp encoder (in order to extract the input's latent code). Therefore, I take only the first three channels if x
when calling self.pretrained_encoder(...)
.
Does that make sense?
Oh, I had a mistake. I understand now. Thank you very much for explaining it!
First of all I want to thank you for the great project. The results were very impressive! However, when I read the code I felt that something was not like what was described in the paper. I hope you can explain it to me. Thank you!
In the following line of code: https://github.com/yuval-alaluf/SAM/blob/fb6699845bd50e9b6bf8520112c6a746456128f4/training/coach_aging.py#L266 You calculate the cosine weight based on target_ages. But I think cosine weight should be calculated based on abs(source ages - target_ages). Is that correct?
In the following line of code: https://github.com/yuval-alaluf/SAM/blob/fb6699845bd50e9b6bf8520112c6a746456128f4/training/coach_aging.py#L114 y_hat_inverse is generated by concatenating y_hat_clone and the age that the age predictor predicts for y_hat_clone. However, according to paper, y_hat_inverse should be made by concatenating y_hat_clone and source ages (of the original image). Is that correct?