Closed Stealthwriter closed 1 month ago
Hey!
this model does not has a classification head , or id2label in its config. https://huggingface.co/microsoft/deberta-v3-large/blob/main/config.json
therefore, you maybe can use it for embeddings, but it should be not well performing. Essentially its a model for mask-filling, which is not a downstream task that is interesting to support in infinity.
System Info
I started the server on runpod with deberta v3, but I got this and the model didn't download:
root@eb4c9177bc5e:/workspace# infinity_emb v2 --model-id microsoft/deberta-v3-large --port 8000 INFO 2024-06-01 20:01:40,608 datasets INFO: PyTorch version 2.3.0 available. config.py:58 INFO: Started server process [1091] INFO: Waiting for application startup. INFO 2024-06-01 20:01:41,678 infinity_emb INFO: model=
microsoft/deberta-v3-large
selected, using engine=torch
and device=None
select_model.py:54 INFO 2024-06-01 20:01:41,848 sentence_transformers.SentenceTransformer INFO: Use pytorch device_name: cuda SentenceTransformer.py:188 INFO 2024-06-01 20:01:41,849 sentence_transformers.SentenceTransformer INFO: Load pretrained SentenceTransformer: SentenceTransformer.py:196 microsoft/deberta-v3-largeWARNING 2024-06-01 20:01:41,934 sentence_transformers.SentenceTransformer WARNING: No sentence-transformers model found with name SentenceTransformer.py:1298 microsoft/deberta-v3-large. Creating a new one with mean pooling.
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning:
resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, useforce_download=True
. warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/convert_slow_tokenizer.py:560: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text. warnings.warn( INFO 2024-06-01 20:01:44,327 infinity_emb INFO: Adding optimizations via Huggingface optimum. acceleration.py:25 WARNING 2024-06-01 20:01:44,329 infinity_emb WARNING: BetterTransformer is not available for model: <class acceleration.py:36 'transformers.models.deberta_v2.modeling_deberta_v2.DebertaV2Model'> Continue without bettertransformer modeling code.INFO 2024-06-01 20:01:44,330 infinity_emb INFO: Switching to half() precision (cuda: fp16). sentence_transformer.py:73 Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation. INFO 2024-06-01 20:01:44,944 infinity_emb INFO: Getting timings for batch_size=32 and avg tokens per sentence=1 select_model.py:77 0.60 ms tokenization
17.30 ms inference
0.08 ms post-processing
17.99 ms total
embeddings/sec: 1778.96
INFO 2024-06-01 20:01:45,633 infinity_emb INFO: Getting timings for batch_size=32 and avg tokens per sentence=512 select_model.py:83 14.28 ms tokenization
317.46 ms inference
0.26 ms post-processing
332.00 ms total
embeddings/sec: 96.39
INFO 2024-06-01 20:01:45,635 infinity_emb INFO: model warmed up, between 96.39-1778.96 embeddings/sec at batch_size=32 select_model.py:84 INFO 2024-06-01 20:01:45,636 infinity_emb INFO: creating batching engine batch_handler.py:291 INFO 2024-06-01 20:01:45,638 infinity_emb INFO: ready to batch requests. batch_handler.py:354 INFO 2024-06-01 20:01:45,640 infinity_emb INFO: infinity_server.py:56
INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) INFO: 100.64.0.28:42100 - "GET / HTTP/1.1" 307 Temporary Redirect INFO: 100.64.0.28:42100 - "GET /docs HTTP/1.1" 200 OK INFO: 100.64.0.23:49954 - "GET / HTTP/1.1" 307 Temporary Redirect INFO: 100.64.0.23:49954 - "GET /docs HTTP/1.1" 200 OK INFO: 100.64.0.23:49954 - "GET /openapi.json HTTP/1.1" 200 OK INFO: 100.64.0.23:51264 - "POST /classify HTTP/1.1" 400 Bad Request INFO: 100.64.0.23:34484 - "POST /classify HTTP/1.1" 400 Bad Request INFO: 100.64.0.23:54440 - "POST /classify HTTP/1.1" 400 Bad Request
Information
Tasks
Reproduction
infinity_emb v2 --model-id microsoft/deberta-v3-large
Expected behavior
to work with classify endpoint since its classification model