michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip
https://michaelfeil.github.io/infinity/
MIT License
1.36k stars 106 forks source link

Jina reranker(turbo/tiny) being classified as embedding models #325

Open John42506176Linux opened 2 months ago

John42506176Linux commented 2 months ago

System Info

System Info:

AWS EC2 G4dn

Amazon Linux

Model:jinaai/jina-reranker-v1-tiny-en or jinaai/jina-reranker-v1-turbo-en

Hardware: Nvidia-smi

Using latest docker version

Command:

port=7997 rerank_model=jinaai/jina-reranker-v1-tiny-en volume=$PWD/data

sudo docker run -it --gpus all \ -v $volume:/app/.cache \ -p $port:$port \ michaelf34/infinity:latest \ v2 \ --batch-size 256 \ --model-id $rerank_model \ --port $port

Information

Tasks

Reproduction

Run the following command,

port=7997 rerank_model=jinaai/jina-reranker-v1-tiny-en volume=$PWD/data

sudo docker run -it --gpus all \ -v $volume:/app/.cache \ -p $port:$port \ michaelf34/infinity:latest \ v2 \ --batch-size 256 \ --model-id $rerank_model \ --port $port

Then attempt to use the /rerank endpoint with a simple body

{ "query": "test", "documents": [ "test" ], "return_documents": false, "model": "jinaai/jina-reranker-v1-tiny-en" }

and you will get the following error

"error": { "message": "ModelNotDeployedError: model=jinaai/jina-reranker-v1-tiny-en does not support rerank. Reason: the loaded moded cannot fullyfill rerank.options are {'embed'}.", "type": null, "param": null, "code": 400 } }.

I've tested this with other inference servers like Text embedding inference and the same error occurs, however, it does not occur with the standard transformer library.

Expected behavior

Should be able to rerank with these models.

greptile-apps[bot] commented 2 months ago

Resolution Steps

  1. Update create_server function in /libs/infinity_emb/infinity_emb/infinity_server.py:

    • Modify _resolve_engine to include reranker models in its capabilities check.
      def _resolve_engine(model: str) -> "AsyncEmbeddingEngine":
      try {
         engine: "AsyncEmbeddingEngine" = app.engine_array[model]  # type: ignore
      } except IndexError as ex {
         raise errors.OpenAIException(
             f"Invalid model: {ex}",
             code=status.HTTP_400_BAD_REQUEST,
         )
      }
      if engine.is_overloaded() {
         raise errors.OpenAIException(
             f"model {model} is currently overloaded",
             code=status.HTTP_429_TOO_MANY_REQUESTS,
         )
      }
      if 'rerank' not in engine.capabilities {
         raise errors.OpenAIException(
             f"ModelNotDeployedError: model=`{model}` does not support `rerank`.",
             code=status.HTTP_400_BAD_REQUEST,
         )
      }
      return engine
      }
  2. Update tests in /libs/infinity_emb/tests/end_to_end/test_torch_reranker.py:

    • Ensure reranker models are correctly identified and tested.

      @pytest.mark.anyio
      async def test_reranker(client, model_base, helpers) {
      query = "Where is the Eiffel Tower located?"
      documents = [
         "The Eiffel Tower is located in Paris, France",
         "The Eiffel Tower is located in the United States.",
         "The Eiffel Tower is located in the United Kingdom.",
      ]
      response = await client.post(
         f"{PREFIX}/rerank",
         json={"model": MODEL, "query": query, "documents": documents},
      )
      assert response.status_code == 200
      rdata = response.json()
      assert "model" in rdata
      assert "usage" in rdata
      rdata_results = rdata["results"]
      
      predictions = [
         model_base.predict({"text": query, "text_pair": doc}) for doc in documents
      ]
      
      assert len(rdata_results) == len(predictions)
      for i, pred in enumerate(predictions) {
         assert abs(rdata_results[i]["relevance_score"] - pred["score"]) < 0.01
      }
      }

References

/docs/docs/index.md /README.md /libs/infinity_emb/tests/end_to_end/test_optimum_embedding.py /libs/infinity_emb/infinity_emb/fastapi_schemas/convert.py /.github/ISSUE_TEMPLATE/new-model-addition.yml /docs/docs/deploy.md /docs/assets/openapi.json /docs/docs /libs/infinity_emb/tests/data/benchmark /docs/benchmarks/simple_app.py /.github/ISSUE_TEMPLATE/bug-report.yml /libs/infinity_emb/tests/end_to_end/test_torch_reranker.py /libs/infinity_emb/tests/conftest.py /docs/docs/python_engine.md /docs/benchmarks /docs/docs/benchmarking.md /docs /libs/infinity_emb/tests/end_to_end/test_authentication.py /libs/infinity_emb/tests/data /libs/infinity_emb /.github/workflows/release.yaml /docs/assets /.github/ISSUE_TEMPLATE /libs/infinity_emb/infinity_emb/infinity_server.py

#### About Greptile This response provides a starting point for your research, not a precise solution. Help us improve! Please leave a ๐Ÿ‘ if this is helpful and ๐Ÿ‘Ž if it is irrelevant. [Ask Greptile](https://app.greptile.com/chat/github/michaelfeil/infinity/main) ยท [Edit Issue Bot Settings](https://app.greptile.com/apps/github)
michaelfeil commented 2 months ago

Damn , Greptile is pretty useless.

michaelfeil commented 2 months ago

Does e.g. something like this work? https://huggingface.co/jinaai/jina-reranker-v1-turbo-en/discussions/10 aka --revision refs/pr/10 or so ? Seems like jina messed up their config. https://huggingface.co/mixedbread-ai/mxbai-rerank-xsmall-v1/blob/main/config.json mixedbread does it better here..

michaelfeil commented 2 months ago

@John42506176Linux The detail is that you need to set "architectures": ["JinaBertForSequenceClassification"], in the config.json to be recognized as reranker model. Would you be so kind and open PRs at the Jina models for this? I apprechiate your time

John42506176Linux commented 2 months ago

Assumed it was something simple, thanks for the quick response, I am testing your first comment rn.

John42506176Linux commented 2 months ago

K looks good. Thanks for the quick fix, I appreciate the quick response, I'll open a pr, for the tiny model soon. (Need to finish testing for the turbo model first), but thanks saved me some time :).

John42506176Linux commented 2 months ago

@michaelfeil Tiny gives the following error when making the config.json change RuntimeError: Error(s) in loading state_dict for JinaBertForSequenceClassification:
size mismatch for classifier.weight: copying a param with shape torch.Size([1, 384]) from checkpoint, the shape in current model is torch.Size([2, 384]). size mismatch for classifier.bias: copying a param with shape torch.Size([1]) from checkpoint, the shape in current model is torch.Size([2]). You may consider adding ignore_mismatched_sizes=True in the model from_pretrained method.

michaelfeil commented 2 months ago

@John42506176Linux Seems like the reranker model has 2 outputs. That is not how rerankers are trained. Rerankers usually have one and only one output class.

With all respect, I don't have time to fix Jina's questionable choice for training here. The config file is ambiguous and leaves a lot of room for how to load the model.

John42506176Linux commented 2 months ago

No worries, you already saved me time, by helping with turbo. Thanks for the assistance.