Open John42506176Linux opened 3 months ago
Update create_server
function in /libs/infinity_emb/infinity_emb/infinity_server.py
:
_resolve_engine
to include reranker models in its capabilities check.
def _resolve_engine(model: str) -> "AsyncEmbeddingEngine":
try {
engine: "AsyncEmbeddingEngine" = app.engine_array[model] # type: ignore
} except IndexError as ex {
raise errors.OpenAIException(
f"Invalid model: {ex}",
code=status.HTTP_400_BAD_REQUEST,
)
}
if engine.is_overloaded() {
raise errors.OpenAIException(
f"model {model} is currently overloaded",
code=status.HTTP_429_TOO_MANY_REQUESTS,
)
}
if 'rerank' not in engine.capabilities {
raise errors.OpenAIException(
f"ModelNotDeployedError: model=`{model}` does not support `rerank`.",
code=status.HTTP_400_BAD_REQUEST,
)
}
return engine
}
Update tests in /libs/infinity_emb/tests/end_to_end/test_torch_reranker.py
:
Ensure reranker models are correctly identified and tested.
@pytest.mark.anyio
async def test_reranker(client, model_base, helpers) {
query = "Where is the Eiffel Tower located?"
documents = [
"The Eiffel Tower is located in Paris, France",
"The Eiffel Tower is located in the United States.",
"The Eiffel Tower is located in the United Kingdom.",
]
response = await client.post(
f"{PREFIX}/rerank",
json={"model": MODEL, "query": query, "documents": documents},
)
assert response.status_code == 200
rdata = response.json()
assert "model" in rdata
assert "usage" in rdata
rdata_results = rdata["results"]
predictions = [
model_base.predict({"text": query, "text_pair": doc}) for doc in documents
]
assert len(rdata_results) == len(predictions)
for i, pred in enumerate(predictions) {
assert abs(rdata_results[i]["relevance_score"] - pred["score"]) < 0.01
}
}
/docs/docs/index.md /README.md /libs/infinity_emb/tests/end_to_end/test_optimum_embedding.py /libs/infinity_emb/infinity_emb/fastapi_schemas/convert.py /.github/ISSUE_TEMPLATE/new-model-addition.yml /docs/docs/deploy.md /docs/assets/openapi.json /docs/docs /libs/infinity_emb/tests/data/benchmark /docs/benchmarks/simple_app.py /.github/ISSUE_TEMPLATE/bug-report.yml /libs/infinity_emb/tests/end_to_end/test_torch_reranker.py /libs/infinity_emb/tests/conftest.py /docs/docs/python_engine.md /docs/benchmarks /docs/docs/benchmarking.md /docs /libs/infinity_emb/tests/end_to_end/test_authentication.py /libs/infinity_emb/tests/data /libs/infinity_emb /.github/workflows/release.yaml /docs/assets /.github/ISSUE_TEMPLATE /libs/infinity_emb/infinity_emb/infinity_server.py
Damn , Greptile is pretty useless.
Does e.g. something like this work? https://huggingface.co/jinaai/jina-reranker-v1-turbo-en/discussions/10 aka --revision refs/pr/10
or so ? Seems like jina messed up their config. https://huggingface.co/mixedbread-ai/mxbai-rerank-xsmall-v1/blob/main/config.json mixedbread does it better here..
@John42506176Linux The detail is that you need to set "architectures": ["JinaBertForSequenceClassification"],
in the config.json to be recognized as reranker model. Would you be so kind and open PRs at the Jina models for this? I apprechiate your time
Assumed it was something simple, thanks for the quick response, I am testing your first comment rn.
K looks good. Thanks for the quick fix, I appreciate the quick response, I'll open a pr, for the tiny model soon. (Need to finish testing for the turbo model first), but thanks saved me some time :).
@michaelfeil Tiny gives the following error when making the config.json change RuntimeError: Error(s) in loading state_dict for JinaBertForSequenceClassification:
size mismatch for classifier.weight: copying a param with shape torch.Size([1, 384]) from checkpoint, the shape in current model is torch.Size([2, 384]).
size mismatch for classifier.bias: copying a param with shape torch.Size([1]) from checkpoint, the shape in current model is torch.Size([2]).
You may consider adding ignore_mismatched_sizes=True
in the model from_pretrained
method.
@John42506176Linux Seems like the reranker model has 2 outputs. That is not how rerankers are trained. Rerankers usually have one and only one output class.
With all respect, I don't have time to fix Jina's questionable choice for training here. The config file is ambiguous and leaves a lot of room for how to load the model.
No worries, you already saved me time, by helping with turbo. Thanks for the assistance.
Hi @michaelfeil and @John42506176Linux ,
adding num_labels:1
additonally to the config seems to do the trick here.
I tested it with:
infinity_emb v2 --model-id jinaai/jina-reranker-v1-turbo-en --revision refs/pr/11
And it correclty shows up as a rerank model:
"capabilities":["rerank"]
@wirthual Thanks so much! This is exactly the solution for this model!
System Info
System Info:
AWS EC2 G4dn
Amazon Linux
Model:jinaai/jina-reranker-v1-tiny-en or jinaai/jina-reranker-v1-turbo-en
Hardware: Nvidia-smi
Using latest docker version
Command:
port=7997 rerank_model=jinaai/jina-reranker-v1-tiny-en volume=$PWD/data
sudo docker run -it --gpus all \ -v $volume:/app/.cache \ -p $port:$port \ michaelf34/infinity:latest \ v2 \ --batch-size 256 \ --model-id $rerank_model \ --port $port
Information
Tasks
Reproduction
Run the following command,
port=7997 rerank_model=jinaai/jina-reranker-v1-tiny-en volume=$PWD/data
sudo docker run -it --gpus all \ -v $volume:/app/.cache \ -p $port:$port \ michaelf34/infinity:latest \ v2 \ --batch-size 256 \ --model-id $rerank_model \ --port $port
Then attempt to use the /rerank endpoint with a simple body
{ "query": "test", "documents": [ "test" ], "return_documents": false, "model": "jinaai/jina-reranker-v1-tiny-en" }
and you will get the following error
"error": { "message": "ModelNotDeployedError: model=
jinaai/jina-reranker-v1-tiny-en
does not supportrerank
. Reason: the loaded moded cannot fullyfillrerank
.options are {'embed'}.", "type": null, "param": null, "code": 400 } }.I've tested this with other inference servers like Text embedding inference and the same error occurs, however, it does not occur with the standard transformer library.
Expected behavior
Should be able to rerank with these models.