Open MLlove0402 opened 1 month ago
Can you post the full logs?
Can you post the full logs?
Here my full log
INFO 2024-08-01 01:49:48,998 datasets INFO: PyTorch version 2.3.1 available. config.py:58
INFO: Started server process [116201]
INFO: Waiting for application startup.
INFO 2024-08-01 01:49:50,366 infinity_emb INFO: model=/cloudata/thainq/models/models/stella_en_1.5B_v5/
selected, using engine=torch
and device=None
select_model.py:57
INFO 2024-08-01 01:49:50,369 sentence_transformers.SentenceTransformer INFO: Use pytorch device_name: cuda SentenceTransformer.py:189
INFO 2024-08-01 01:49:50,369 sentence_transformers.SentenceTransformer INFO: Load pretrained SentenceTransformer: /cloudata/thainq/models/models/stella_en_1.5B_v5/ SentenceTransformer.py:197
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 2024-08-01 01:49:56,722 sentence_transformers.SentenceTransformer INFO: 2 prompts are loaded, with the keys: ['s2p_query', 's2s_query'] SentenceTransformer.py:326
INFO 2024-08-01 01:49:56,925 infinity_emb INFO: Adding optimizations via Huggingface optimum. acceleration.py:46
WARNING 2024-08-01 01:49:56,927 infinity_emb WARNING: BetterTransformer is not available for model: <class 'transformers_modules.modeling_qwen.Qwen2Model'> Continue without bettertransformer modeling code. acceleration.py:57
INFO 2024-08-01 01:49:56,928 infinity_emb INFO: Switching to half() precision (cuda: fp16). sentence_transformer.py:81
INFO 2024-08-01 01:49:57,812 infinity_emb INFO: Getting timings for batch_size=32 and avg tokens per sentence=2 select_model.py:80
7.07 ms tokenization
27.29 ms inference
0.13 ms post-processing
34.49 ms total
embeddings/sec: 927.83
INFO 2024-08-01 01:49:58,583 infinity_emb INFO: Getting timings for batch_size=32 and avg tokens per sentence=512 select_model.py:86
16.83 ms tokenization
338.24 ms inference
0.39 ms post-processing
355.46 ms total
embeddings/sec: 90.03
INFO 2024-08-01 01:49:58,586 infinity_emb INFO: model warmed up, between 90.03-927.83 embeddings/sec at batch_size=32 select_model.py:87
INFO 2024-08-01 01:49:58,588 infinity_emb INFO: creating batching engine batch_handler.py:321
INFO 2024-08-01 01:49:58,589 infinity_emb INFO: ready to batch requests. batch_handler.py:384
INFO 2024-08-01 01:49:58,593 infinity_emb INFO: infinity_server.py:63
♾️ Infinity - Embedding Inference Server
MIT License; Copyright (c) 2023 Michael Feil
Version 0.0.53
Open the Docs via Swagger UI:
http://0.0.0.0:3002/docs
Access model via 'GET':
curl http://0.0.0.0:3002/models
INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:3002 (Press CTRL+C to quit) INFO: 172.16.250.26:50758 - "POST /embeddings HTTP/1.1" 200 OK
To use this package, you need ensure that your model is compatible with https://github.com/UKPLab/sentence-transformers. Is it compatible?
To use this package, you need ensure that your model is compatible with https://github.com/UKPLab/sentence-transformers. Is it compatible?
Yes i'm also run this model on sentence-transformers and it work well
Did you try it with the docker container?
Did you try it with the docker container?
I'm not try docker yet, but i notice every model use QWEN2 return list embedding null with my above input (gt-qwen2-1.5, gte-qwen2-7), is it a problems ?
System Info
infinity 0.0.53 OS version: linux Model being used: dunzhang/stella_en_1.5B_v5 Hardware used: NVIDIA A100
Information
Tasks
Reproduction
I run following Command: infinity_emb v2 --model-id dunzhang/stella_en_1.5B_v5 --port 3002 --trust-remote-code --served-model-name embedding Then i use /embeddings api: { "input": [ "5.2" ], "model": "embedding" } I got list of null value of embedding, i try some other model and they are not return null value like this model
Expected behavior
This should return list of float value