microsoft / torchscale

Foundation Architecture for (M)LLMs
https://aka.ms/GeneralAI
MIT License
3.01k stars 202 forks source link

question about the number of output_projection #29

Closed violet-sto closed 1 year ago

violet-sto commented 1 year ago

Hello!

I found that there is only one output_projection (nn.Linear(768, 64000)) for masked language modeling. However, as Beit-3 is a multimodal model, should there also be a output head for masked image modeling?

donglixp commented 1 year ago

Yes, we used separate heads.