Difficulty reproducing SBIR results

pinakinathc / fscoco

Code and Dataset for FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context.

http://pinakinathc.me/fscoco

Other

17 stars 4 forks source link

Difficulty reproducing SBIR results #2

Open jabader97 opened 1 year ago

jabader97 commented 1 year ago

Hello, thank you for your work. I’m having a bit of difficulty using your sbir_baseline to reproduce the Siam.-VGG16 results from your paper when training and evaluating on FS-COCO for the FG-SBIR task—my runs reach closer to 12 (35) for R@1 (R@10), as opposed to 23.3 (52.6) reported. Are there perhaps specific important settings, or additional changes I would need to make to reproduce your results?

Thanks in advance, Jessie

pinakinathc commented 1 year ago

hey Jessie, before diving deep, can you confirm the batch size? (you can use gradient accumulation to simulate a higher batch size)

jabader97 commented 1 year ago

Thank you for your fast response!

The original results I quoted were with the default parameters in the codebase (bs 16 and accumulate_grad_batches 8, for a total of 128). Your paper mentions that for CLIP you used bs 256, so I also tried doubling the gradient accumulation. It did slightly better, but still topped out around 13 (36).

I did need to put it into 'train mode' by removing load_from_checkpoint and commenting back in the line for trainer.fit, perhaps there was something else I needed to put back in?

jabader97 commented 1 year ago

Hello, thank you again for your work.

I wanted to follow up on my questions before. Specifically, 1.) if there are any other settings or changes I would need to reproduce your work, and 2.) if there is anything to do for putting sbir_baseline/main.py into 'train mode', other than putting back trainer.fit in line 87 and not loading from the checkpoint in line 50.

Additionally, I wanted to check how many epochs you used. In the code base, it is 100000, but I was unable to find it reported in the paper.

pinakinathc commented 1 year ago

hi @jabader97 sorry for the late response, was sick for a few days :)

I am adding a checkpoint to help resolve your issue: https://drive.google.com/file/d/1ug_Yemql2PrCh_8YHBeR3r3SxX5wuVGE/view?usp=share_link Please download the checkpoint, as I might delete it from Google drive after a few weeks.

Also, the 100000 epochs is just to make sure my training doesn't prematurely stops before converging.