michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
https://michaelfeil.eu/infinity/
MIT License
971 stars 72 forks source link

japanese-reranker-cross-encoder-large-v1 does not work with CLI #192

Closed nassie256 closed 2 months ago

nassie256 commented 3 months ago

System Info

OS version: Ubuntu 22.04.3 LTS Model being used: hotchpotch/japanese-reranker-cross-encoder-large-v1 Hardware used (GPUs/CPU/Accelerator): NVIDIA GeForce RTX 3090 The current version being used: Python 3.11.7(pyenv virtualenvs), torch==2.2.1, transformers==4.39.3, sentence-transformers==2.6.1, infinity_emb==0.0.31

Information

Tasks

Reproduction

I am trying to use the reranker model for Japanese, "hotchpotch/japanese-reranker-cross-encoder-large-v1".

This works fine from the Python code. For example, the following code works perfectly

# You need to install the japanese tokenizer and dictionary beforehand
pip install fugashi[unidic-lite]
import asyncio
from infinity_emb import AsyncEmbeddingEngine, EngineArgs

MODEL_NAME = "hotchpotch/japanese-reranker-cross-encoder-large-v1"
engine_args = EngineArgs(model_name_or_path=MODEL_NAME, engine="torch")

query = "感動的な映画について"
docs = [
    "深いテーマを持ちながらも、観る人の心を揺さぶる名作。登場人物の心情描写が秀逸で、ラストは涙なしでは見られない。",
    "重要なメッセージ性は評価できるが、暗い話が続くので気分が落ち込んでしまった。もう少し明るい要素があればよかった。",
    "どうにもリアリティに欠ける展開が気になった。もっと深みのある人間ドラマが見たかった。",
    "アクションシーンが楽しすぎる。見ていて飽きない。ストーリーはシンプルだが、それが逆に良い。",
]
engine = AsyncEmbeddingEngine.from_args(engine_args)
async def main(): 
    async with engine:
        ranking, usage = await engine.rerank(query=query, docs=docs)
        print(list(zip(ranking, docs)))
asyncio.run(main())

However, when I try to serve it from the CLI, I get the following error:

$ infinity_emb --model-name-or-path hotchpotch/japanese-reranker-cross-encoder-large-v1 --port 7996 INFO 2024-04-04 18:33:38,793 datasets INFO: PyTorch version 2.2.1 available. config.py:58 INFO: Started server process [12866] INFO: Waiting for application startup. INFO 2024-04-04 18:33:39,500 infinity_emb INFO: model=hotchpotch/japanese-reranker-cross-encoder-large-v1 selected, using engine=torch and device=None select_model.py:54 INFO 2024-04-04 18:33:40,947 sentence_transformers.cross_encoder.CrossEncoder INFO: Use pytorch device: cuda CrossEncoder.py:82 INFO 2024-04-04 18:33:41,858 infinity_emb INFO: Adding optimizations via Huggingface optimum. acceleration.py:17 The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details. INFO 2024-04-04 18:33:41,866 infinity_emb INFO: Switching to half() precision (cuda: fp16). Disable by the setting the env var INFINITY_DISABLE_HALF torch.py:59 /home/ubuntu/.pyenv/versions/3.11.7/envs/emb/lib/python3.11/site-packages/optimum/bettertransformer/models/encoder_models.py:301: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at ../aten/src/ATen/NestedTensorImpl.cpp:177.) hidden_states = torch._nested_tensor_from_mask(hidden_states, ~attention_mask) ERROR: Traceback (most recent call last): File "/home/ubuntu/.pyenv/versions/3.11.7/envs/emb/lib/python3.11/site-packages/starlette/routing.py", line 677, in lifespan async with self.lifespan_context(app) as maybe_state: File "/home/ubuntu/.pyenv/versions/3.11.7/envs/emb/lib/python3.11/site-packages/starlette/routing.py", line 566, in aenter await self._router.startup() File "/home/ubuntu/.pyenv/versions/3.11.7/envs/emb/lib/python3.11/site-packages/starlette/routing.py", line 654, in startup await handler() File "/home/ubuntu/.pyenv/versions/3.11.7/envs/emb/lib/python3.11/site-packages/infinity_emb/infinity_server.py", line 62, in _startup app.model = AsyncEmbeddingEngine.from_args(engine_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/.pyenv/versions/3.11.7/envs/emb/lib/python3.11/site-packages/infinity_emb/engine.py", line 49, in from_args engine = cls(**asdict(engine_args), _show_deprecation_warning=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/.pyenv/versions/3.11.7/envs/emb/lib/python3.11/site-packages/infinity_emb/engine.py", line 40, in init self._model, self._min_inference_t, self._max_inference_t = select_model( ^^^^^^^^^^^^^ File "/home/ubuntu/.pyenv/versions/3.11.7/envs/emb/lib/python3.11/site-packages/infinity_emb/inference/select_model.py", line 68, in select_model loaded_engine.warmup(batch_size=engine_args.batch_size, n_tokens=1) File "/home/ubuntu/.pyenv/versions/3.11.7/envs/emb/lib/python3.11/site-packages/infinity_emb/transformer/abstract.py", line 97, in warmup return run_warmup(self, inp) ^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/.pyenv/versions/3.11.7/envs/emb/lib/python3.11/site-packages/infinity_emb/transformer/abstract.py", line 113, in run_warmup f"{model.tokenize_lengths([i.content.str_repr() for i in inputs])[0]}\n" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/.pyenv/versions/3.11.7/envs/emb/lib/python3.11/site-packages/infinity_emb/transformer/crossencoder/torch.py", line 112, in tokenize_lengths return [len(t.tokens) for t in tks] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: 'NoneType' object is not iterable

ERROR: Application startup failed. Exiting.

Expected behavior

I can use it from the CLI.

michaelfeil commented 3 months ago

tokenizer and dictionary beforehand pip install fugashi[unidic-lite] - this installed tokenization library does not implement the full huggingface tokenizer API.

have you considered opening a feature request in this library?

nassie256 commented 3 months ago

Thank you for your answer. I understand that it is a compatibility issue on the fugashi side. We will consider submitting a feature request to the library.

nassie256 commented 2 months ago

This issue is closed as it has become clear that it is an issue that needs to be addressed on the model side.

michaelfeil commented 2 months ago

Thanks for managing the issue and your reply! 😃