michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip
https://michaelfeil.github.io/infinity/
MIT License
1.31k stars 96 forks source link

jinaai/jina-reranker-v1-*-en does not work with optimum #362

Open rawsh opened 2 weeks ago

rawsh commented 2 weeks ago

System Info

py3.10 infinity-emb 0.0.55

INFO     2024-09-13 15:19:59,927 datasets INFO: PyTorch version 2.4.0 available.                                                            config.py:59
INFO:     Started server process [76898]
INFO:     Waiting for application startup.
INFO     2024-09-13 15:20:01,042 infinity_emb INFO: model=`jinaai/jina-reranker-v1-tiny-en` selected, using engine=`optimum` and      select_model.py:62
         device=`cpu`                                                                                                                                   
INFO     2024-09-13 15:20:01,393 infinity_emb INFO: Found 7 onnx files: [PosixPath('onnx/model.onnx'),                              utils_optimum.py:217
         PosixPath('onnx/model_bnb4.onnx'), PosixPath('onnx/model_fp16.onnx'), PosixPath('onnx/model_int8.onnx'),                                       
         PosixPath('onnx/model_q4.onnx'), PosixPath('onnx/model_quantized.onnx'), PosixPath('onnx/model_uint8.onnx')]                                   
INFO     2024-09-13 15:20:01,401 infinity_emb INFO: Using onnx/model_quantized.onnx as the model                                    utils_optimum.py:221
INFO     2024-09-13 15:20:01,412 infinity_emb INFO: Optimized model found at                                                        utils_optimum.py:120
         /Users/robert/.cache/huggingface/hub/infinity_onnx/CPUExecutionProvider/jinaai/jina-reranker-v1-tiny-en/model_quantized_op                     
         timized.onnx, skipping optimization                                                                                                            
The ONNX file model_quantized_optimized.onnx is not a regular name used in optimum.onnxruntime, the ORTModel might not behave as expected.
ERROR:    Traceback (most recent call last):
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/infinity_server.py", line 63, in lifespan
    app.engine_array = AsyncEngineArray.from_args(engine_args_list)  # type: ignore
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/engine.py", line 259, in from_args
    return cls(engines=tuple(engines))
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/engine.py", line 67, in from_args
    engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False)
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/engine.py", line 53, in __init__
    self._model, self._min_inference_t, self._max_inference_t = select_model(
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/inference/select_model.py", line 76, in select_model
    loaded_engine.warmup(batch_size=engine_args.batch_size, n_tokens=1)
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/transformer/abstract.py", line 86, in warmup
    return run_warmup(self, inp)
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/transformer/abstract.py", line 180, in run_warmup
    model.encode_post(embed)
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/transformer/quantization/interface.py", line 141, in wrapper
    embeddings = func(self, *args, **kwargs)
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/transformer/embedder/optimum.py", line 105, in encode_post
    return normalize(embedding).astype(np.float32)
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/infinity_emb/transformer/utils_optimum.py", line 47, in normalize
    norm = np.linalg.norm(input_array, ord=p, axis=dim, keepdims=True)
  File "/Users/robert/Library/Caches/pypoetry/virtualenvs/genai-toolbox-JUYepP8o-py3.10/lib/python3.10/site-packages/numpy/linalg/linalg.py", line 2583, in norm
    return sqrt(add.reduce(s, axis=axis, keepdims=keepdims))
numpy.exceptions.AxisError: axis 1 is out of bounds for array of dimension 1

ERROR:    Application startup failed. Exiting.

Information

Tasks

Reproduction

infinity_emb v2 --model-id jinaai/jina-reranker-v1-tiny-en --device cpu --engine optimum

Expected behavior

onnx works with jina