the retrieval loss doesn't converge well - Githubissues

ruotianluo / DiscCaptioning

Code for Discriminability objective for training descriptive captions(CVPR 2018)

110 stars 21 forks source link

the retrieval loss doesn't converge well #11

Open qq283215389 opened 5 years ago

qq283215389 commented 5 years ago

Hello, luo when I pretrain the VSEFCmodel, the vse_loss doesn't converge well , just around 51.2. is there some mistakes in my experiments, how about your vse_loss when you pretrain VSEFCmodel?

ruotianluo commented 5 years ago

thats very common in the first several epochs. Try training it a little bit longer. Or just restart the training.

qq283215389 commented 5 years ago

ok, thanks a lot, for another VSE model(VSEAttModel) and "pair loss" , whose result isn't shown in your paper "Discriminability objective for training descriptive captions" in CVPR 2018?

ruotianluo commented 5 years ago

Pair loss is worse and vseattmodel gives worse result too.

qq283215389 commented 5 years ago

thanks！if the retrieval model perform better（like the paper“Stacked Cross Attention for Image-Text Matching”），can we get a better result for captioning model？

ruotianluo commented 5 years ago

I think it's very likely.

qq283215389 commented 5 years ago

hello,luo It's my result of pre-training retrieval model after i run “run_fc_con.sh”, there is still a difference with your result presented in your paper for the retrieval model. Result: Average i2t Recall: 53.9 Image to text: 29.9 59.2 72.6 4.0 19.6 Average t2i Recall: 42.3 Text to image: 20.6 46.5 59.8 7.0 40.8

ruotianluo commented 5 years ago

Did you download my pretrained model? Does it perform better and the same as what's reported in the paper? https://drive.google.com/open?id=1oQ_O-O2KoSQv1xdBPKaIOGt-VW0gS-42 These are my training curves, to give you a hint.

qq283215389 commented 5 years ago

i might get the problem，i have used the size of 7x7 for coco fc features, i think u have used 14x14 for coco fc features?

ruotianluo commented 5 years ago

fc feature doest have spatial dimensions, it's a vector

qq283215389 commented 5 years ago

I found other paper use Karpathy'split for COCO, your paper use rama's split, whose test data are the same? why you can compare your result with the result in self-critical?

ruotianluo commented 5 years ago

the splits are different. The self critical one is my implementation on Rama's split. Using Rama split I'd because we need to compare ours to Rama's result.