hassanhub / MultiGrounding

This is the repo for Multi-level textual grounding
GNU General Public License v3.0
33 stars 10 forks source link

Pointing Game accuracy doesn't match as in paper #3

Closed BelaChakraborty closed 4 years ago

BelaChakraborty commented 4 years ago

Hello..

My name is Bela. I am quite interested in grounding and your work seemed quite interesting. I managed to pre-process the data for flickr30k and have run till train.py. The training seems to be complete

But when I run the evaluators code and see the performance, it doesn't seem to match 69.19%. I have trained it on MS-COCO and Tested it on flickr30k (ELMo + PNASNet : I guess this was the default, I haven't changed anything)

I have given some snapshots below:

Highest is 50% Screenshot from 2019-09-10 12-19-59

It ends with this: Screenshot from 2019-09-10 12-20-22

hassanhub commented 4 years ago

Thank you very much for your interest in our work. We re-implemented the entire code (including pre-processing codes) for final release, so it is normal to have slight difference. But, the ones that you're reporting is not normal. Can you please try re-evaluating on pre-trained models that we've provided? If the problem does not exist, there should be a problem in training and I will take a look at it.

BelaChakraborty commented 4 years ago

I want to lay out few things for the above results which You have already seen. I want to outline the course of execution which I followed. If I am not mistaken, in the paper, according to table 1, you have trained on MSCOCO 2014 and tested on VG, Flick30k and ReferIt.

I just wanted to reproduce results for Flickr30. So according instructions:

1) Pre-processing --

Screenshot from 2019-09-11 12-09-19

2) Pre-Trained model downloded: I did download the pre-trained model and have it here as you can see below: Screenshot from 2019-09-11 12-16-15

2) EXECUTION:

Screenshot from 2019-09-11 12-17-49

   But it gave the the results which you already saw in my last post after I executed as mentioned in 
   evaluators.pynb but the abv results which gives me a difference of around 15%

 - On further examination, I find this setting in the code( You can see it in the screenshot above):
    Epoch = 10 , LR = 0.0001

    Unlike in paper where you mention:
    Total Epoch = 20, Optimiser = ADAM

    Further it's mentioned:
    Epoch 1 - 10 : LR = 0.001
    Epoch 11 to 15: LR = 0.0005
    Epoch 16 to 20: LR = 0.00025

   I don't see this setting in the code as mentioned in the paper. So I wanted to ask, where am I 
   going wrong. I would be really grateful if you could provide your valuable guidance
hassanhub commented 4 years ago

All your pre-processing steps look fine. What I meant was try evaluating the models that we have already provided (not the ones you trained) and see if the problem still exists. As mentioned, we reimplemented the entire pipeline and it is possible that somewhere in the training procedure is not consistent with before. About LR and number of epochs. You can simply set them by decay steps and decay rate and specifying number of epochs. Please also consider that distributed training of Tensorflow has slightly different results than its single GPU training (as it aggregates derivatives and weights).

BelaChakraborty commented 4 years ago

Hello Forgive my naive question here, I am new to this and haven't used tensorflow before. So when you say: Try evaluating the models that we have already provided (not the ones you trained) and see if the problem still exists. Do you mean I need to just evaluate it(BUT NOT TRAIN IT) using the evaluation code below? Screenshot from 2019-09-12 12-43-59

If so, I am a bit confused when you say USE UR TRAINED MODEL, what should be mentioned in: 1) ckpt_path = './models/groundnet_pnas_elmo_1x1_vg-10' [This doesn't exist in the ./models as you can see from the screenshot below, what should be here if I want to test PNAS+ELMo which gives 69.19 PG accuracy,I WANT TO TEST THIS ONE AS THIS GIVES HIGHEST ACCURACY]

Screenshot from 2019-09-12 12-55-29

2) gnet_config = './configs/pnas_elmo_1x1.yml' [ I guess this remains this if it's PNAS+ELMo]

Secondly, I did evaluate it as per your instruction using the evaluation code as follows: 1) BiLSTM+VGG--> ckpt_path = './models/groundnet_vgg_bilstm_1x1_coco_final' gnet_config = './configs/vgg_bilstm_1x1.yml'

When I evaluate using TRAINED MODEL WHICH YOU HAVE PROVIDED, As you can see below screenshot the highest PG accuracy is 48.32 but paper says 53.29: Screenshot from 2019-09-12 18-10-52

2) ELMo+VGG
ckpt_path = './models/groundnet_vgg_elmo_1x1_coco_final' gnet_config = './configs/vgg_elmo_1x1.yml'

When I evaluate using the TRAINED MODEL WHICH YOU HAVE PROVIDED, As you can see below screenshot the highest PG accuracy is 37.02 but paper says 61.66 Screenshot from 2019-09-12 16-45-16

Would it be possible for you to share the trained models for PNAS+Elmo(accuracy 69.19) and ELMO+VGG(accuracy 61.66) which gave you that accuracy. I guess the BiLSTM+VGG gives a 4.97% of gap which is fine. I would be grateful if you could advice on this.

Thank you very much Bela

qinzzz commented 4 years ago

Hello Forgive my naive question here, I am new to this and haven't used tensorflow before. So when you say: Try evaluating the models that we have already provided (not the ones you trained) and see if the problem still exists. Do you mean I need to just evaluate it(BUT NOT TRAIN IT) using the evaluation code below? Screenshot from 2019-09-12 12-43-59

If so, I am a bit confused when you say USE UR TRAINED MODEL, what should be mentioned in:

  1. ckpt_path = './models/groundnet_pnas_elmo_1x1_vg-10' [This doesn't exist in the ./models as you can see from the screenshot below, what should be here if I want to test PNAS+ELMo which gives 69.19 PG accuracy,I WANT TO TEST THIS ONE AS THIS GIVES HIGHEST ACCURACY]

Screenshot from 2019-09-12 12-55-29

  1. gnet_config = './configs/pnas_elmo_1x1.yml' [ I guess this remains this if it's PNAS+ELMo]

Secondly, I did evaluate it as per your instruction using the evaluation code as follows:

  1. BiLSTM+VGG--> ckpt_path = './models/groundnet_vgg_bilstm_1x1_coco_final' gnet_config = './configs/vgg_bilstm_1x1.yml'

When I evaluate using TRAINED MODEL WHICH YOU HAVE PROVIDED, As you can see below screenshot the highest PG accuracy is 48.32 but paper says 53.29: Screenshot from 2019-09-12 18-10-52

  1. ELMo+VGG ckpt_path = './models/groundnet_vgg_elmo_1x1_coco_final' gnet_config = './configs/vgg_elmo_1x1.yml'

When I evaluate using the TRAINED MODEL WHICH YOU HAVE PROVIDED, As you can see below screenshot the highest PG accuracy is 37.02 but paper says 61.66 Screenshot from 2019-09-12 16-45-16

Would it be possible for you to share the trained models for PNAS+Elmo(accuracy 69.19) and ELMO+VGG(accuracy 61.66) which gave you that accuracy. I guess the BiLSTM+VGG gives a 4.97% of gap which is fine. I would be grateful if you could advice on this.

Thank you very much Bela

I had the same problem. Did you figure it out?

BelaChakraborty commented 4 years ago

@qinzzz : Hello

Nope. I couldn't solve it. I did what the author suggested but still the results didn't match. So had no other option but to give up. I am not that well versed with tensorflow as well

aurooj commented 4 years ago

@qinzzz @BelaChakraborty About package deprecation and version compatibility issues, did you have to resolve them first before being able to successfully train the network, or just ignored the warnings and it started training eventually? I am trying to run train.py but it shows me tons of deprecation warnings, and then no signs of training being started. GPU utilization is also 0% when I run the scripts. Also, how long did it take for you to train the system? In my case, I tried with both python 3.6/3.7, tensorflow-gpu 1.14.0 and 1.15.0, cuda version is 10.0.

BelaChakraborty commented 4 years ago

@aurooj 1) About package deprecation and version compatibility issues, did you have to resolve them first before being able to successfully train the network, or just ignored the warnings and it started training eventually?: I had tried reproducing the results long back, and as far as I remember, I did resolve the version compatibility issue and packages still had time to deprecate though i also got warnings, but i turned them off.

2) I am trying to run train.py but it shows me tons of deprecation warnings, and then no signs of training being started. GPU utilization is also 0% when I run the scripts: I am not sure what's wrong here as its on your system, but I did run the training from scratch and was running the script for PG as well which you can see form screenshot, it's not their model tested, I trained from scratch. And I wasn't sure why it didn't match, as I am not well versed with Tensorflow.

3) Also, how long did it take for you to train the system? In my case, I tried with both python 3.6/3.7, tensorflow-gpu 1.14.0 and 1.15.0, cuda version is 10.0: I guess it took 1-2 days.. not more than that i guess.

I did contact the author regarding the such a high difference on results, it seems they had changed the pipeline recently before uploading on GITHUB and he told it is working just fine on his system, So can't comment much now. But I also didn't get the results claim. But as far as I remember from last year, I had made environment of exact versions and little difference would not give a contrasting difference of 20% or so..

Regards Bela

aurooj commented 4 years ago

@BelaChakraborty thanks for your prompt reply. Appreciate it! Yeah, In my case, nothing happens after it is done with showing those warnings. Can you recall what version compatibility issues you were facing? Or the environment you were using? Thanks a lot!

BelaChakraborty commented 4 years ago

Hey,

I had created anaconda environment. I can try sending u that but right now I am not well. I am not gng to uni. I would send u ASAP.

It was almost 6-9 months back, I can’t recall the issues, sorry

Regards Bela

Get Outlook for iOShttps://aka.ms/o0ukef


From: aukhan notifications@github.com Sent: Sunday, July 19, 2020 6:58:41 PM To: hassanhub/MultiGrounding MultiGrounding@noreply.github.com Cc: Bela Chakraborty bc914@uowmail.edu.au; Mention mention@noreply.github.com Subject: Re: [hassanhub/MultiGrounding] Pointing Game accuracy doesn't match as in paper (#3)

@BelaChakrabortyhttps://github.com/BelaChakraborty thanks for your prompt reply. Appreciate it! Yeah, In my case, nothing happens after it is done with showing those warnings. Can you recall what version compatibility issues you were facing? Or the environment you were using? Thanks a lot!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/hassanhub/MultiGrounding/issues/3#issuecomment-660612214, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMKK2RYQWKWNOBYMEAYV4V3R4KYUDANCNFSM4IVCKR6A.

aurooj commented 4 years ago

Hi, sure, thanks for the help. I hope you get well soon. Take good care!

hassanhub commented 4 years ago

Sorry for the very late reply to all these concerns. I added the original codes that I used for generating the numbers in the paper. Please find them under This Page, and let me know if the problems still exist. Please note that due to different random seeds during generating the numbers for the original paper, you might see slightly different numbers by re-generating them. Also, please note that for these old codes you'd need a different pre-processed data, which I have already uploaded and included their link in README.md page (under original_codes_stable).