Questions about Visual Grounding checkpoint and visualization

zzzzzigzag commented 2 years ago

Thank you a lot for your outstanding work! I'm having problems with Visual Grounding task:

How did you get refcoco.pth? During Fine-tuning according to this provided procedure, a 3.3G checkpoint_best.pth file, as large as the pretrained model, is generated; However the checkpoint: refcoco.pth you've given is only 800M, Could you explain how you managed to shrink the model size? I tried distill :True and distill :False in config file, making no difference to the final size. python -m torch.distributed.launch --nproc_per_node=8 --use_env Grounding.py \ --config ./configs/Grounding.yaml \ --output_dir output/RefCOCO \ --gradcam_mode itm \ --block_num 8 \ --checkpoint [Pretrained checkpoint, size 3.3G]
How to evaluate refcoco.pth? Setting distill: False in config file does not work for me; python -m torch.distributed.launch --nproc_per_node=8 --use_env Grounding.py \ --config ./configs/Grounding.yaml \ --output_dir output/RefCOCO_albefpth \ --gradcam_mode itm \ --block_num 8 \ --evaluate \ --checkpoint refcoco.pth Drops following KeyError problem: Traceback (most recent call last): File "Grounding.py", line 295, in <module> main(args, config) File "Grounding.py", line 187, in main state_dict = checkpoint['model'] KeyError: 'model'
How to visualize the 3.3G checkpoint_best.pth file generated by fine-tuning? During Fine-tuning, the [val, test_A, test_B] metrics data printed out seems fine. However, the visualization.ipynb only works for refcoco.pth, but not works for the 3.3G checkpoint_best.pth generated by fine-tuning, the heat map is totally mess, not as expected. There seems a gap between checkpoint_best.pth and refcoco.pth.

LiJunnan1992 commented 2 years ago

Hi, thanks for your question!

refcoco.pth contains the model's state_dict from checkpoint['model'], with the momentum model's state_dict removed to reduce file size. Hence, in order to load refcoco.pth, you can directly use state_dict = checkpoint, and set distill=False.

zzzzzigzag commented 2 years ago

Hi, thanks for your question!

refcoco.pth contains the model's state_dict from checkpoint['model'], with the momentum model's state_dict removed to reduce file size. Hence, in order to load refcoco.pth, you can directly use state_dict = checkpoint, and set distill=False.

Thank you for you kind reply! I am now clear about two of the above questions:

For the 2nd question, I changed state_dict = checkpoint['model'] to state_dict = checkpoint at line 187, and deleted https://github.com/salesforce/ALBEF/blob/main/Grounding.py#L188-L191 , then I got [val, test_A, test_B] metrics same with the paper.

For the 3rd question, in 5. Load model and tokenizer part of visualization.ipynb, I made following change and the visualization results goes well:

"""for 3.3G not distilled models: """
msg = model.load_state_dict(checkpoint['model'],strict=False)
"""for distilled models: """
#msg = model.load_state_dict(checkpoint,strict=False)

However I am still confused about how you shrink the model size. Where should I modify in Grounding.py to get a 800M checkpoint file? If just go with the original code, the checkpoint size would be 3.3G.

LiJunnan1992 commented 2 years ago

I used another script to delete the momentum model's parameters from checkpoint['model'] to shrink the size.

zzzzzigzag commented 2 years ago

I used another script to delete the momentum model's parameters from checkpoint['model'] to shrink the size.

Thank you, I can export 800M pth checkpoint file now

yuese1234 commented 8 months ago

Hi, thanks for your question! refcoco.pth contains the model's state_dict from checkpoint['model'], with the momentum model's state_dict removed to reduce file size. Hence, in order to load refcoco.pth, you can directly use state_dict = checkpoint, and set distill=False.

Thank you for you kind reply! I am now clear about two of the above questions:

For the 2nd question, I changed state_dict = checkpoint['model'] to state_dict = checkpoint at line 187, and deleted https://github.com/salesforce/ALBEF/blob/main/Grounding.py#L188-L191 , then I got [val, test_A, test_B] metrics same with the paper.

For the 3rd question, in 5. Load model and tokenizer part of visualization.ipynb, I made following change and the visualization results goes well:
"""for 3.3G not distilled models: """
msg = model.load_state_dict(checkpoint['model'],strict=False)
"""for distilled models: """
#msg = model.load_state_dict(checkpoint,strict=False)
However I am still confused about how you shrink the model size. Where should I modify in Grounding.py to get a 800M checkpoint file? If just go with the original code, the checkpoint size would be 3.3G.

hi I also meet the problem that the heat map is totally mess when I use the checkpoint_best.pth. And I followed the steps you gave, （msg = model.load_state_dict(checkpoint['model'],strict=False)，but the problem is still not solved .Can you share if there is any additional processing. Also, can you share how you exported the 800M file? Thank you very much!

salesforce / ALBEF

Questions about Visual Grounding checkpoint and visualization #50