Closed violet-sto closed 1 year ago
Hi @violet-sto,
The head for masked image modeling is "mim_head.weight" and "mim_head.bias". They are in the BEiT3-Large and BEiT3-Base
Thank you so much for your reply! Unfortunately, I didn't find where 'mim_head' was defined in the code of Torchscale.
Hi @violet-sto, the architecture in Torchscale does not contain that part. You can add the following code:
mim_head = nn.Linear(embed_dim, 8192)
Describe Beit3:
Hi!
I found that there is only one output_projection (nn.Linear(768, 64000)) for masked language modeling. However, as Beit-3 is a multimodal model, should there also be a output head for masked image modeling?