hytseng0509 / CrossDomainFewShot

Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation (ICLR 2020 spotlight)
329 stars 62 forks source link

About ft_optim grad from ft_loss #13

Open remiMZ opened 4 years ago

remiMZ commented 4 years ago

Hi, I reproduced your code and found that ft_loss did not produce a gradient in the film layer, so how does your learning to learn update ft_optim?

hytseng0509 commented 4 years ago

You can refer to the implementation in methods/backbone.py and methods/LFTNet.py.

dori2063 commented 4 years ago

Hi, thank you for sharing your efforts. I tried to understand the eq.(6) and (7) through your code, but i guess failed. As i understand "ft_loss" calculated without ft layers in methods/LFTNet.py-line137. And the line138 tries to calculate and update the gradients of the ft layers. However, as I know it is not possible because the ft layer was not used. How can it possible to calculate the gradients of ft layer?

dori2063 commented 4 years ago

Thanks to your response in the openreview, I guess understood. Thank you!

lianglele185 commented 4 years ago

c=

Thanks to your response in the openreview, I guess understood. Thank you!

can you explain why the ft layer can be updated without using it

dori2063 commented 4 years ago

The ft layer is not used for pu, while the ft layer used for ps is used for pu. (In the end, ft layer is used for pu.) In https://openreview.net/forum?id=SJl5Np4tPr, the author's sentence "the updated model used to calculate~" made me understand. You can think about it with how defined and used "self.ft_optim" in methods/LFTNet.py.

LavieLuo commented 3 years ago

Hi, thank you for sharing this information. It confuses me so much.

The ft layer is not used for pu, while the ft layer used for ps is used for pu. (In the end, ft layer is used for pu.)

After reading your reply, I guess it can be understood by "the ps and up share a single ft layer (i.e., the same parameter $theta_f$), which is only optimized on pu"?