Closed ericyoshida closed 2 years ago
Hi @ericyoshida, thank you for your question. Have you had a look at the correlation maps produced with your features from which LOST boxes are extracted? You can create those visualization with the option --visualize fms
. I assume that the maps flucuate when the model is fine-tuned and might have different properties. LOST assume that background pixels are more correlated than pixels corresponding to foreground objects (which could be observed with DINO features). Also did you observe an improvement of the attention maps when doing fine-tuning?
I’ve trained DINO’s model with my own Dataset, doing a finetuning on the ViT’s pre trained models of DINO. After a feel experiments I noticed that, every time that a epoch of the DINO’s finetune ran, the loss of the training reduce, however the IoU (the validation metric that we are using) of the bounding boxes generated by the LOST algorithm gets worse. Can anyone explain me why this is happening and how can I fix it?