seanzhuh / SeqTR

SeqTR: A Simple yet Universal Network for Visual Grounding
https://arxiv.org/abs/2203.16265
128 stars 14 forks source link

size mismatch for head.transformer.seq_positional_encoding.embedding.weight: #20

Closed CCYChongyanChen closed 1 year ago

CCYChongyanChen commented 1 year ago

Dear Author, I am trying to use the model for Refcocog (pre-trained + fine-tuned SeqTR segmentation) and test it on Refcoco dataset and visualize the results.

The code I run is "python tools/inference.py /home/chch3470/SeqTR/configs/seqtr/segmentation/seqtr_segm_refcoco-unc.py "/home/chch3470/SeqTR/work_dir/segm_best.pth" --output-dir="/home/chch3470/SeqTR/attention_map_output" --with-gt --which-set="testA" "

I meet the error below. Do you have any idea why it happens? Is Refcocog (pre-trained + fine-tuned SeqTR segmentation) based on yolo or darknet? If it is based on yolo, what configs should we use? Also, should we change the vis_encs(currently the codebase only provides darknet.py for vis_encs)?

I can visualize the provided models for detection tasks so I guess I know the basic setups...

RuntimeError: Error(s) in loading state_dict for SeqTR: size mismatch for lan_enc.embedding.weight: copying a param with shape torch.Size([12692, 300]) from checkpoint, the shape in current model is torch.Size([10344, 300]). size mismatch for head.transformer.seq_positional_encoding.embedding.weight: copying a param with shape torch.Size([25, 256]) from checkpoint, the shape in current model is torch.Size([37, 256]).

seanzhuh commented 1 year ago

In RefCOCOg, we sample 12 points instead of 18 points as in RefCOCO dataset, you should use configs/seqtr/segmentation/seqtr_segm_refcocog-umd.py if you are testing the model trained on refcocog dataset. 12692 and 10344 are the distinctive number of words in each dataset.

CCYChongyanChen commented 1 year ago

Thank you so much for the quick reply. Is that possible to test it on a customized dataset without fine-tuning it? Our experiment will have two settings (1) use the pretrained one and directly test on our dataset (2) pretraining+fine-tuning on our dataset.

For example, if I use the pretrained RefCOCOg dataset and directly test it on RefCOCO without fine-tuning, could I just replace RefCOCO's two pkl files and word_emb with RefCOCOg's two pkl files and word_emb? Would that work?

In RefCOCOg, we sample 12 points instead of 18 points as in RefCOCO dataset, you should use configs/seqtr/segmentation/seqtr_segm_refcocog-umd.py if you are testing the model trained on refcocog dataset. 12692 and 10344 are the distinctive number of words in each dataset.

seanzhuh commented 1 year ago

yes, that'll work, and you also need to change the sampled number of points in configuration to align with the checkpoint model.

CCYChongyanChen commented 1 year ago

yes, that'll work, and you also need to change the sampled number of points in configuration to align with the checkpoint model.

Thank you so much! Just to confirm, in order to run refcocog on other dataset(e.g., refcoco) I need to (1) set num_ray=18 to num_ray=12 for refcoco-unc.py (2) modify num_ray at line 1 to 12 and model.head.shuffle_fraction to 0.2 at line 35, in configs/seqtr/segmentation/seqtr_mask_darknet.py.

Do I need to change the max_token from 15 to 20?

---- update: I changed (1) and (2) and that worked. Didnt change the max_token.

CCYChongyanChen commented 1 year ago

Another question, do we need to disable LSJ and EMA for pre-trainning/fine-tuning for the segmentation tasks? Are LSJ and EMA only for training from scratch?