xiaoyuan1996 / GaLR

Source code of paper "Remote Sensing Cross-Modal Image-Text Retrieval Based on Global and Local Information"
MIT License
59 stars 15 forks source link

Processed Annotations #2

Closed usmanxia closed 1 year ago

usmanxia commented 1 year ago

Hi! I have been trying to reproduce the results as depicted in paper. However, I think we cannot do that without processed annotations. Can you please provide processed Annotation for RSIMTD and RSICD datasets. Really appreciate your help

xiaoyuan1996 commented 1 year ago

We already process it, please see: data/rsicd_precomp/train_caps.txt

usmanxia commented 1 year ago

Thank you, got it. Can you please also point towards the lines of code that implement the DREA method of the proposed technique?

xiaoyuan1996 commented 1 year ago

DREA is used to generate the detection boxs, which we saved at https://github.com/xiaoyuan1996/GaLR/tree/main/detection/representation for RSITMD and RSICD.

If you want to achieve in a new dataset, follow https://github.com/xiaoyuan1996/GaLR/issues/1#issuecomment-1261979964.

Best regards.

usmanxia commented 1 year ago

Thank you @xiaoyuan1996 I have been trying to reproduce the results as mentioned in the paper. However, the evaluation metrics I get are pretty low as compared to the ones mentioned in the paper.

I have kept the options same as mentioned in RSTMTD_Galr.yaml and the results i am getting after 20 epochs with 3 folds are :-

Best: r1i:7.1012805587892895 r5i:18.62630966239814 r10i:29.452852153667056 medri:31.0 meanri:147.86495925494762 r1t:4.516880093131548 r5t:18.649592549476136 r10t:31.967403958090802 medrt:20.0 meanrt:79.87613504074505 sum:18.38571982925883

Can you please help in understanding if I am missing something.

Really appreciate

xiaoyuan1996 commented 1 year ago

For the training epoch of RSITMD in config, you can try to increase the number of epochs to 50, because the model may not converge at 20.

usmanxia commented 1 year ago

Hi @xiaoyuan1996 I have ran the model for 70 epoch with k-folds set to 3 on RSIMTD dataset. All options are identical to the ones mentioned in yaml file except the number of epoch = 70 The results are

Best: r1i:8.847497089639115 r5i:20.954598370197903 r10i:30.267753201396975 medri:29.0 meanri:160.0651920838184 r1t:5.960419091967404 r5t:23.63213038416764 r10t:35.90221187427241 medrt:20.0 meanrt:88.63096623981373 sum:20.927435001940243

Can you please guide if any option that needs to be changed for the model to produce results as mentioned in publication?

Really appreciate your help and response.

Regards

xiaoyuan1996 commented 1 year ago

Sorry for the confusion. If you're having issues without touching any of the configs, it's possible that the pre-training files weren't loaded correctly. In this code, we use resnet-18 as the backbone and use gru in skipthoughts as the text backbone. Please check whether the corresponding pre-training parameters are loaded correctly. Also, maybe you could do a controlled experiment and run RSICD with your code, and I'm guessing your results would be much worse as well. You can comment with your training loss, let's analyze it together.