vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.91k stars 3.95k forks source link

[Feature]: BERT models for embeddings #5179

Open mevince opened 3 months ago

mevince commented 3 months ago

Now with the introduction of embeddings: https://github.com/vllm-project/vllm/pull/3734, are there plans on the roadmap to support BERT models?

robertgshaw2-neuralmagic commented 3 months ago

Yep - we would welcome a PR

Etelis commented 3 months ago

Sounds fun! I'm on this Any notes or hints on that one? @DarkLight1337 @robertgshaw2-neuralmagic

Thanks!

laishzh commented 3 months ago

I also have interest in this task. Referring to #3734, I just use transformers.BertModel to implement the BertEmbeddingModel class(https://github.com/vllm-project/vllm/compare/main...laishzh:vllm:feat/bert). The code is in very early version, but it can output the embedding which I think is wrong~ The reason maybe is that the weights are not loaded correctly. This is my first development. I'm not sure whether is the right way to implement, or need to reimplement BertModel? Suggestions or cooperation are welcome.

@Etelis Also hope it helps.

robertgshaw2-neuralmagic commented 3 months ago

The main thing you have to do is implement the BERTModel or XLMRobertaModel in the vllm/model_executor/models directory using the layers in vllm/model_exeuctor/layers. And then register the model in the Registry.

You can look at how llama and others are implemented in that directory as inspiration