Closed ariG23498 closed 1 year ago
Hey @ariG23498 -- it looks like you're correct that this logic is wrong. I'm not certain if optimizer.apply_gradients
will realize that these are the same weights but I expect not.
Please feel free to send a fix and thank you for catching this!
I'd propose that we just add a field in the constructor called self._shared_projector
or similar which tracks whether or not there are two unique projectors.
https://github.com/keras-team/keras-cv/blob/b3798d8140605b651ac46f40fe7072ea08175720/keras_cv/training/contrastive/contrastive_trainer.py#L233-L238
https://github.com/keras-team/keras-cv/blob/b3798d8140605b651ac46f40fe7072ea08175720/keras_cv/training/contrastive/contrastive_trainer.py#L240-L247
In the above two code snippets we assume that both the projectors are distinct. What if we provide only one projector which is then replicated to fill a tuple as shown below 👇
https://github.com/keras-team/keras-cv/blob/b3798d8140605b651ac46f40fe7072ea08175720/keras_cv/training/contrastive/contrastive_trainer.py#L112-L114
In that case the gradients of the projector is calculated twice and also the projector is updated twice.
If I have read this code snippet differently and am mistaken please feel free to close this issue.
CC: @ID6109