is FastChat suppport Embedding model such as gte-base?

ruifengma commented 7 months ago

I tried to use model_work to deploy an Embedding model gte-base

CUDA_VISIBLE_DEVICES=1 python -m fastchat.serve.model_worker --model-name 'gte-base' --model-path /model/gte-base --debug True --worker-address http://0.0.0.0:21009/ --port 21009 --host 0.0.0.0 --controller-address http://0.0.0.0:21001/

and I got the error which is

Some weights of BertLMHeadModel were not initialized from the model checkpoint at /model/gte-base and are newly initialized: ['cls.predictions.bias', 'cls.predictions.decoder.bias]

I use the transformers and pytorch with the code on the repo model card in the same environment and it works normally

import torch.nn.functional as F
from torch import Tensor
from transformers import AutoTokenizer, AutoModel

def average_pool(last_hidden_states: Tensor,
                 attention_mask: Tensor) -> Tensor:
    last_hidden = last_hidden_states.masked_fill(~attention_mask[..., None].bool(), 0.0)
    return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]

input_texts = [
    "what is the capital of China?",
    "how to implement quick sort in python?",
    "Beijing",
    "sorting algorithms"
]

tokenizer = AutoTokenizer.from_pretrained("/model/gte-base")
model = AutoModel.from_pretrained("/model/gte-base")

# Tokenize the input texts
batch_dict = tokenizer(input_texts, max_length=512, padding=True, truncation=True, return_tensors='pt')

outputs = model(**batch_dict)
embeddings = average_pool(outputs.last_hidden_state, batch_dict['attention_mask'])

# (Optionally) normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:1] @ embeddings[1:].T) * 100
print(scores.tolist())

So I would like to ask if FastChat support this embedding or model_worker is the correct to deploy embedding model? Thanks in advance

surak commented 7 months ago

I managed to use GritLM as an embedding model, but the values are not right. Maybe you have more success than I do...

ruifengma commented 7 months ago

I managed to use GritLM as an embedding model, but the values are not right. Maybe you have more success than I do...

Hi @surak , Thanks for the reply. May I ask how do you deploy your embedding model?

surak commented 7 months ago

I managed to use GritLM as an embedding model, but the values are not right. Maybe you have more success than I do... Hi @surak , Thanks for the reply. May I ask how do you deploy your embedding model?

Using the normal model_worker - the sglang and the vllm workers don't work.

I run it on slurm, but it's the same as anywhere else:

srun python3 $BLABLADOR_DIR/fastchat/serve/model_worker.py \
     --controller $BLABLADOR_CONTROLLER:$BLABLADOR_CONTROLLER_PORT \
     --port 31041 --worker-address http://$(hostname).fz-juelich.de:31041 \
     --num-gpus 1 \
     --host $BLABLADOR_CONTROLLER \
     --model-path models/GritLM-7B \
     --model-name "alias-embeddings,gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002,GritLM-7B"

I alias to the openai model names so langchain can work.

lm-sys / FastChat

is FastChat suppport Embedding model such as gte-base? #3181