salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.57k stars 199 forks source link

Questions about Visual Grounding checkpoint and visualization #50

Closed zzzzzigzag closed 2 years ago

zzzzzigzag commented 2 years ago

Thank you a lot for your outstanding work! I'm having problems with Visual Grounding task:

  1. How did you get refcoco.pth? During Fine-tuning according to this provided procedure, a 3.3G checkpoint_best.pth file, as large as the pretrained model, is generated; However the checkpoint: refcoco.pth you've given is only 800M, Could you explain how you managed to shrink the model size? I tried distill :True and distill :False in config file, making no difference to the final size. python -m torch.distributed.launch --nproc_per_node=8 --use_env Grounding.py \ --config ./configs/Grounding.yaml \ --output_dir output/RefCOCO \ --gradcam_mode itm \ --block_num 8 \ --checkpoint [Pretrained checkpoint, size 3.3G]

  2. How to evaluate refcoco.pth? Setting distill: False in config file does not work for me; python -m torch.distributed.launch --nproc_per_node=8 --use_env Grounding.py \ --config ./configs/Grounding.yaml \ --output_dir output/RefCOCO_albefpth \ --gradcam_mode itm \ --block_num 8 \ --evaluate \ --checkpoint refcoco.pth Drops following KeyError problem: Traceback (most recent call last): File "Grounding.py", line 295, in <module> main(args, config) File "Grounding.py", line 187, in main state_dict = checkpoint['model'] KeyError: 'model'

  3. How to visualize the 3.3G checkpoint_best.pth file generated by fine-tuning? During Fine-tuning, the [val, test_A, test_B] metrics data printed out seems fine. However, the visualization.ipynb only works for refcoco.pth, but not works for the 3.3G checkpoint_best.pth generated by fine-tuning, the heat map is totally mess, not as expected. There seems a gap between checkpoint_best.pth and refcoco.pth.

LiJunnan1992 commented 2 years ago

Hi, thanks for your question!

refcoco.pth contains the model's state_dict from checkpoint['model'], with the momentum model's state_dict removed to reduce file size. Hence, in order to load refcoco.pth, you can directly use state_dict = checkpoint, and set distill=False.

zzzzzigzag commented 2 years ago

Hi, thanks for your question!

refcoco.pth contains the model's state_dict from checkpoint['model'], with the momentum model's state_dict removed to reduce file size. Hence, in order to load refcoco.pth, you can directly use state_dict = checkpoint, and set distill=False.

Thank you for you kind reply! I am now clear about two of the above questions:

For the 2nd question, I changed state_dict = checkpoint['model'] to state_dict = checkpoint at line 187, and deleted https://github.com/salesforce/ALBEF/blob/main/Grounding.py#L188-L191 , then I got [val, test_A, test_B] metrics same with the paper.

For the 3rd question, in 5. Load model and tokenizer part of visualization.ipynb, I made following change and the visualization results goes well:

"""for 3.3G not distilled models: """
msg = model.load_state_dict(checkpoint['model'],strict=False)
"""for distilled models: """
#msg = model.load_state_dict(checkpoint,strict=False)

However I am still confused about how you shrink the model size. Where should I modify in Grounding.py to get a 800M checkpoint file? If just go with the original code, the checkpoint size would be 3.3G.

LiJunnan1992 commented 2 years ago

I used another script to delete the momentum model's parameters from checkpoint['model'] to shrink the size.

zzzzzigzag commented 2 years ago

I used another script to delete the momentum model's parameters from checkpoint['model'] to shrink the size.

Thank you, I can export 800M pth checkpoint file now

yuese1234 commented 8 months ago

Hi, thanks for your question! refcoco.pth contains the model's state_dict from checkpoint['model'], with the momentum model's state_dict removed to reduce file size. Hence, in order to load refcoco.pth, you can directly use state_dict = checkpoint, and set distill=False.

Thank you for you kind reply! I am now clear about two of the above questions:

For the 2nd question, I changed state_dict = checkpoint['model'] to state_dict = checkpoint at line 187, and deleted https://github.com/salesforce/ALBEF/blob/main/Grounding.py#L188-L191 , then I got [val, test_A, test_B] metrics same with the paper.

For the 3rd question, in 5. Load model and tokenizer part of visualization.ipynb, I made following change and the visualization results goes well:

"""for 3.3G not distilled models: """
msg = model.load_state_dict(checkpoint['model'],strict=False)
"""for distilled models: """
#msg = model.load_state_dict(checkpoint,strict=False)

However I am still confused about how you shrink the model size. Where should I modify in Grounding.py to get a 800M checkpoint file? If just go with the original code, the checkpoint size would be 3.3G.

hi I also meet the problem that the heat map is totally mess when I use the checkpoint_best.pth. And I followed the steps you gave, (msg = model.load_state_dict(checkpoint['model'],strict=False),but the problem is still not solved .Can you share if there is any additional processing. Also, can you share how you exported the 800M file? Thank you very much!