How does the logit_scale vary while training , i noitced that in my case it starts from the 14.28(1/0.07) and then just goes down and towards the end of the training it reaches 1

Akshay1-6180 commented 5 months ago

I am running clip on my own dataset and noticed this where the logit_scale converges to 1. Is this a good behavior to expect , i noticed that the loss becomes constant during this time. I know that a higher logit_scale amplifies differences between the logits, making the softmax output distribution sharper and thus making the model more confident in its most likely predictions , does the model lowering the logits mean the model is becoming less confident or the model is getting confused reducing the learning rate resolves this issue but it will start converting towards a value lower than 14 (mostly btw 6-8).Not sure what conclusion i can make from this. I use adamw optimizer with a VIT B vision model , Bert text encoder and weight decay is 0.1 , eps is 1e-8,betas=[0.9,0.999]

rom1504 commented 5 months ago

https://wandb.ai/rom1504/open-clip/reports/xlm-roberta-base-B-32--VmlldzoyOTQ5OTE2 find here one fairly normal clip run

You should see logit scale going to 1, loss decreasing, lr decreasing and accuracy increasing, all fairly in sync.

Akshay1-6180 commented 5 months ago

Thanks @rom1504 Thanks for the logs , isnt the logit_scale going towards 100 as the loss decreases in this case , not to 1

mlfoundations / open_clip

How does the logit_scale vary while training , i noitced that in my case it starts from the 14.28(1/0.07) and then just goes down and towards the end of the training it reaches 1 #815