keras-team / keras-cv

Industry-strength Computer Vision workflows with Keras
Other
1.01k stars 330 forks source link

Double gradient calculation for the projectors in Contrastive Trainer #1887

Closed ariG23498 closed 1 year ago

ariG23498 commented 1 year ago

https://github.com/keras-team/keras-cv/blob/b3798d8140605b651ac46f40fe7072ea08175720/keras_cv/training/contrastive/contrastive_trainer.py#L233-L238

https://github.com/keras-team/keras-cv/blob/b3798d8140605b651ac46f40fe7072ea08175720/keras_cv/training/contrastive/contrastive_trainer.py#L240-L247

In the above two code snippets we assume that both the projectors are distinct. What if we provide only one projector which is then replicated to fill a tuple as shown below 👇

https://github.com/keras-team/keras-cv/blob/b3798d8140605b651ac46f40fe7072ea08175720/keras_cv/training/contrastive/contrastive_trainer.py#L112-L114

In that case the gradients of the projector is calculated twice and also the projector is updated twice.

If I have read this code snippet differently and am mistaken please feel free to close this issue.

CC: @ID6109

ianstenbit commented 1 year ago

Hey @ariG23498 -- it looks like you're correct that this logic is wrong. I'm not certain if optimizer.apply_gradients will realize that these are the same weights but I expect not.

Please feel free to send a fix and thank you for catching this! I'd propose that we just add a field in the constructor called self._shared_projector or similar which tracks whether or not there are two unique projectors.