lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
36.83k stars 4.54k forks source link

is FastChat suppport Embedding model such as gte-base? #3181

Open ruifengma opened 7 months ago

ruifengma commented 7 months ago

I tried to use model_work to deploy an Embedding model gte-base

CUDA_VISIBLE_DEVICES=1 python -m fastchat.serve.model_worker --model-name 'gte-base' --model-path /model/gte-base --debug True --worker-address http://0.0.0.0:21009/ --port 21009 --host 0.0.0.0 --controller-address http://0.0.0.0:21001/

and I got the error which is

Some weights of BertLMHeadModel were not initialized from the model checkpoint at /model/gte-base and are newly initialized: ['cls.predictions.bias', 'cls.predictions.decoder.bias]

I use the transformers and pytorch with the code on the repo model card in the same environment and it works normally

import torch.nn.functional as F
from torch import Tensor
from transformers import AutoTokenizer, AutoModel

def average_pool(last_hidden_states: Tensor,
                 attention_mask: Tensor) -> Tensor:
    last_hidden = last_hidden_states.masked_fill(~attention_mask[..., None].bool(), 0.0)
    return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]

input_texts = [
    "what is the capital of China?",
    "how to implement quick sort in python?",
    "Beijing",
    "sorting algorithms"
]

tokenizer = AutoTokenizer.from_pretrained("/model/gte-base")
model = AutoModel.from_pretrained("/model/gte-base")

# Tokenize the input texts
batch_dict = tokenizer(input_texts, max_length=512, padding=True, truncation=True, return_tensors='pt')

outputs = model(**batch_dict)
embeddings = average_pool(outputs.last_hidden_state, batch_dict['attention_mask'])

# (Optionally) normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:1] @ embeddings[1:].T) * 100
print(scores.tolist())

So I would like to ask if FastChat support this embedding or model_worker is the correct to deploy embedding model? Thanks in advance

surak commented 7 months ago

I managed to use GritLM as an embedding model, but the values are not right. Maybe you have more success than I do...

ruifengma commented 7 months ago

I managed to use GritLM as an embedding model, but the values are not right. Maybe you have more success than I do...

Hi @surak , Thanks for the reply. May I ask how do you deploy your embedding model?

surak commented 7 months ago

I managed to use GritLM as an embedding model, but the values are not right. Maybe you have more success than I do... Hi @surak , Thanks for the reply. May I ask how do you deploy your embedding model?

Using the normal model_worker - the sglang and the vllm workers don't work.

I run it on slurm, but it's the same as anywhere else:

srun python3 $BLABLADOR_DIR/fastchat/serve/model_worker.py \
     --controller $BLABLADOR_CONTROLLER:$BLABLADOR_CONTROLLER_PORT \
     --port 31041 --worker-address http://$(hostname).fz-juelich.de:31041 \
     --num-gpus 1 \
     --host $BLABLADOR_CONTROLLER \
     --model-path models/GritLM-7B \
     --model-name "alias-embeddings,gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002,GritLM-7B" 

I alias to the openai model names so langchain can work.