michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip
https://michaelfeil.github.io/infinity/
MIT License
1.25k stars 86 forks source link

Get null embedding #328

Open MLlove0402 opened 1 month ago

MLlove0402 commented 1 month ago

System Info

infinity 0.0.53 OS version: linux Model being used: dunzhang/stella_en_1.5B_v5 Hardware used: NVIDIA A100

Information

Tasks

Reproduction

I run following Command: infinity_emb v2 --model-id dunzhang/stella_en_1.5B_v5 --port 3002 --trust-remote-code --served-model-name embedding Then i use /embeddings api: { "input": [ "5.2" ], "model": "embedding" } I got list of null value of embedding, i try some other model and they are not return null value like this model

Expected behavior

This should return list of float value

michaelfeil commented 1 month ago

Can you post the full logs?

MLlove0402 commented 1 month ago

Can you post the full logs?

Here my full log INFO 2024-08-01 01:49:48,998 datasets INFO: PyTorch version 2.3.1 available. config.py:58 INFO: Started server process [116201] INFO: Waiting for application startup. INFO 2024-08-01 01:49:50,366 infinity_emb INFO: model=/cloudata/thainq/models/models/stella_en_1.5B_v5/ selected, using engine=torch and device=None select_model.py:57 INFO 2024-08-01 01:49:50,369 sentence_transformers.SentenceTransformer INFO: Use pytorch device_name: cuda SentenceTransformer.py:189 INFO 2024-08-01 01:49:50,369 sentence_transformers.SentenceTransformer INFO: Load pretrained SentenceTransformer: /cloudata/thainq/models/models/stella_en_1.5B_v5/ SentenceTransformer.py:197 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. INFO 2024-08-01 01:49:56,722 sentence_transformers.SentenceTransformer INFO: 2 prompts are loaded, with the keys: ['s2p_query', 's2s_query'] SentenceTransformer.py:326 INFO 2024-08-01 01:49:56,925 infinity_emb INFO: Adding optimizations via Huggingface optimum. acceleration.py:46 WARNING 2024-08-01 01:49:56,927 infinity_emb WARNING: BetterTransformer is not available for model: <class 'transformers_modules.modeling_qwen.Qwen2Model'> Continue without bettertransformer modeling code. acceleration.py:57 INFO 2024-08-01 01:49:56,928 infinity_emb INFO: Switching to half() precision (cuda: fp16). sentence_transformer.py:81 INFO 2024-08-01 01:49:57,812 infinity_emb INFO: Getting timings for batch_size=32 and avg tokens per sentence=2 select_model.py:80 7.07 ms tokenization 27.29 ms inference 0.13 ms post-processing 34.49 ms total embeddings/sec: 927.83 INFO 2024-08-01 01:49:58,583 infinity_emb INFO: Getting timings for batch_size=32 and avg tokens per sentence=512 select_model.py:86 16.83 ms tokenization 338.24 ms inference 0.39 ms post-processing 355.46 ms total embeddings/sec: 90.03 INFO 2024-08-01 01:49:58,586 infinity_emb INFO: model warmed up, between 90.03-927.83 embeddings/sec at batch_size=32 select_model.py:87 INFO 2024-08-01 01:49:58,588 infinity_emb INFO: creating batching engine batch_handler.py:321 INFO 2024-08-01 01:49:58,589 infinity_emb INFO: ready to batch requests. batch_handler.py:384 INFO 2024-08-01 01:49:58,593 infinity_emb INFO: infinity_server.py:63

     ♾️  Infinity - Embedding Inference Server                                                                                                                                                                                           
     MIT License; Copyright (c) 2023 Michael Feil
     Version 0.0.53

     Open the Docs via Swagger UI:
     http://0.0.0.0:3002/docs

     Access model via 'GET':
     curl http://0.0.0.0:3002/models

INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:3002 (Press CTRL+C to quit) INFO: 172.16.250.26:50758 - "POST /embeddings HTTP/1.1" 200 OK

michaelfeil commented 1 month ago

To use this package, you need ensure that your model is compatible with https://github.com/UKPLab/sentence-transformers. Is it compatible?

MLlove0402 commented 1 month ago

To use this package, you need ensure that your model is compatible with https://github.com/UKPLab/sentence-transformers. Is it compatible?

Yes i'm also run this model on sentence-transformers and it work well

michaelfeil commented 1 month ago

Did you try it with the docker container?

MLlove0402 commented 1 month ago

Did you try it with the docker container?

I'm not try docker yet, but i notice every model use QWEN2 return list embedding null with my above input (gt-qwen2-1.5, gte-qwen2-7), is it a problems ?