michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
https://michaelfeil.eu/infinity/
MIT License
959 stars 71 forks source link

BUG ERROR: Server stops accepting new requests after _core_batch(self) exceptions #242

Open vitteloil opened 1 month ago

vitteloil commented 1 month ago

System Info

Hi, Trying to run infinity as the embeddings server for Dify. When there is an error running one POST on /embeddings, the server stops processing further requests.

Is seems https://github.com/michaelfeil/infinity/blob/main/libs/infinity_emb/infinity_emb/inference/batch_handler.py#L423 is not in a try clause which may be the root of this issue ?

Running : infinity_emb v2 --model-id BAAI/bge-small-en-v1.5

Info : WSL2 , python 3.11.9, infinity_emb==0.0.39,

Information

Tasks

Reproduction

ERROR    2024-06-02 08:21:46,304 infinity_emb ERROR: shape '[2, 512]' is invalid for input of size 524288                                                                                    batch_handler.py:434
         Traceback (most recent call last):
           File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/infinity_emb/inference/batch_handler.py", line 423, in _core_batch
             embed = self._model.encode_core(feat)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/infinity_emb/transformer/embedder/sentence_transformer.py", line 97, in encode_core
             out_features: "Tensor" = self.forward(features)["sentence_embedding"]
                                      ^^^^^^^^^^^^^^^^^^^^^^
           File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/container.py", line 217, in forward
             input = module(input)
                     ^^^^^^^^^^^^^
           File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
             return self._call_impl(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
             return forward_call(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/sentence_transformers/models/Transformer.py", line 117, in forward
             output_states = self.auto_model(**trans_features, return_dict=False)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
             return self._call_impl(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
             return forward_call(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 1137, in forward
             encoder_outputs = self.encoder(
                               ^^^^^^^^^^^^^
           File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
             return self._call_impl(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
             return forward_call(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 690, in forward
             layer_outputs = layer_module(
                             ^^^^^^^^^^^^^
           File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
             return self._call_impl(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
             return forward_call(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/optimum/bettertransformer/models/encoder_models.py", line 300, in forward
             attention_mask = torch.reshape(attention_mask, (attention_mask.shape[0], attention_mask.shape[-1]))
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         RuntimeError: shape '[2, 512]' is invalid for input of size 524288

Expected behavior

Whena POST to /embeddings fails , I expect next POSTs to be processed

michaelfeil commented 1 month ago

Okay, thats concering and should not happen. there is no way to “autorecover” e.g. in case you run out of memory. I assume that is the case here.

Hard to guess what is the cause without more information about how you are using infinity via pip. Also check the usage instructions, I updated the tutorials recenty. @vitteloil