Open Akshay1-6180 opened 5 months ago
https://wandb.ai/rom1504/open-clip/reports/xlm-roberta-base-B-32--VmlldzoyOTQ5OTE2 find here one fairly normal clip run
You should see logit scale going to 1, loss decreasing, lr decreasing and accuracy increasing, all fairly in sync.
Thanks @rom1504 Thanks for the logs , isnt the logit_scale going towards 100 as the loss decreases in this case , not to 1
I am running clip on my own dataset and noticed this where the logit_scale converges to 1. Is this a good behavior to expect , i noticed that the loss becomes constant during this time. I know that a higher logit_scale amplifies differences between the logits, making the softmax output distribution sharper and thus making the model more confident in its most likely predictions , does the model lowering the logits mean the model is becoming less confident or the model is getting confused reducing the learning rate resolves this issue but it will start converting towards a value lower than 14 (mostly btw 6-8).Not sure what conclusion i can make from this. I use adamw optimizer with a VIT B vision model , Bert text encoder and weight decay is 0.1 , eps is 1e-8,betas=[0.9,0.999]