mistralai / mistral-inference

Official inference library for Mistral models
Apache License 2.0
9.34k stars 817 forks source link

Embedding model and Engine?? #62

Open muhtalhakhan opened 8 months ago

muhtalhakhan commented 8 months ago

Hey guys,

I am shifting from GPT to Mistral and I am facing one problem which is that I could not find the embedding model and engine for Mistral yet.

I am using the service from DeepInfra

Here's the code snippet which I wrote for GPT:

def get_embedding(text, model="embedding-ada-002"):
  text = text.replace("\n", " ")
  if not text: 
    text = "this is blank"
  return openai.Embedding.create(
          input=[text], model=model)['data'][0]['embedding']

if __name__ == '__main__':
#   gpt_parameter = {"engine": "text-davinci-003", "max_tokens": 50, 
#                    "temperature": 0, "top_p": 1, "stream": False,
#                    "frequency_penalty": 0, "presence_penalty": 0, 
#                    "stop": ['"']}
  gpt_parameter = {"max_tokens": 50, 
                   "temperature": 0, "top_p": 1, "stream": False,
                   "frequency_penalty": 0, "presence_penalty": 0, 
                   "stop": ['"']}

All I want to know is which embedding model and engine should be used?

Thank you 🙂

praveen555 commented 8 months ago

There is no embedding model defined as such.

For each input sentence you have to tokenize using the tokenizer provided my Mistral and then pass those tokens to the model.

Check out the example below posted from the mistral

with torch.no_grad(): featurized_x = []

compute an embedding for each sentence

for i, (x, y) in tqdm.tqdm(enumerate(data)):
    tokens = tokenizer.encode(x, bos=True)
    tensor = torch.tensor(tokens).to(model.device)
    features = model.forward_partial(tensor, [len(tokens)])  # (n_tokens, model_dim)

concatenate sentence embeddings

X = np.concatenate([x[None] for x in featurized_x], axis=0) # (n_points, model_dim)

muhtalhakhan commented 8 months ago

Is there any working example which can help me better with understanding to code?

I am getting some of the lines as a prompt back from the Mistral and I want them to embedded.

praveen555 commented 8 months ago

check the tutorial example provided in the folder by mistral. The code I gave earlier is given on the same.

muhtalhakhan commented 8 months ago

check the tutorial example provided in the folder by mistral. The code I gave earlier is given on the same.

Thanks but I didn't find anything useful. Well, I was just playing with prompts and afterwards I was embedding them to some other function.

zhzfight commented 8 months ago

hi, dude, have you solve the problem?

muhtalhakhan commented 8 months ago

hi, dude, have you solve the problem?

hey, I tried but did not get the enough of good response from the model.