run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
34.58k stars 4.88k forks source link

[Feature Request]: Text Embeddings Inference Reranker #9572

Closed brucethemoose closed 2 weeks ago

brucethemoose commented 8 months ago

Feature Description

Hello, could we get a reranking class in the vein of SentenceTransformerRerank or CohereRerank for a Text Embeddings Inference server?

Reason

We are running into performance/scaling issues with the SentenceTransformerRerank class, even with hacks like BF16 precision and BetterTransformers applied. The Huggingface TEI server should yield much higher performance and can serve many reranking calls at once.

Value of Feature

This would allow call out reranking to a self hosted API, just like the TEI embeddings class.

dosubot[bot] commented 8 months ago

🤖

Hi @brucethemoose! 🦙

I'm Dosu, a friendly bot here to lend a hoof while we wait for a human maintainer. I can help you work out some bugs, answer questions, and even guide you on your way to becoming a contributor. Let's get started, shall we? 🚀

Thank you for your feature request. It's definitely possible to incorporate a reranking class similar to SentenceTransformerRerank or CohereRerank into the Text Embeddings Inference embeddings class to improve performance and scaling, and allow call out reranking to a self-hosted API.

You can create a new class that inherits from BaseNodePostprocessor and implement the _postprocess_nodes method. This method should take a list of nodes and a query bundle as input, and return a list of nodes with updated scores.

Here is a basic example of how you could structure this class:

class YourRerankClass(BaseNodePostprocessor):
    model: str = Field(description="Your model name.")
    top_n: int = Field(description="Number of nodes to return sorted by score.")
    _model: Any = PrivateAttr()

    def __init__(
        self,
        top_n: int = 2,
        model: str = "your-model",
        device: Optional[str] = None,
    ):
        # import your model here
        device = infer_torch_device() if device is None else device
        self._model = YourModel(
            model, device=device
        )
        super().__init__(top_n=top_n, model=model, device=device)

    @classmethod
    def class_name(cls) -> str:
        return "YourRerankClass"

    def _postprocess_nodes(
        self,
        nodes: List[NodeWithScore],
        query_bundle: Optional[QueryBundle] = None,
    ) -> List[NodeWithScore]:
        if query_bundle is None:
            raise ValueError("Missing query bundle in extra info.")

        query_and_nodes = [
            (
                query_bundle.query_str,
                node.node.get_content(metadata_mode=MetadataMode.EMBED),
            )
            for node in nodes
        ]

        with self.callback_manager.event(
            CBEventType.RERANKING,
            payload={
                EventPayload.NODES: nodes,
                EventPayload.MODEL_NAME: self.model,
                EventPayload.QUERY_STR: query_bundle.query_str,
                EventPayload.TOP_K: self.top_n,
            },
        ) as event:
            scores = self._model.predict(query_and_nodes)

            assert len(scores) == len(nodes)
            for node, score in zip(nodes, scores):
                node.score = score

            new_nodes = sorted(nodes, key=lambda x: -x.score if x.score else 0)[
                : self.top_n
            ]
            event.on_end(payload={EventPayload.NODES: new_nodes})

        return new_nodes

In this example, YourModel should be replaced with the model you want to use for reranking. The predict method of this model should take a list of tuples, where each tuple contains a query string and a node content, and return a list of scores. These scores are then used to update the scores of the nodes.

You can refer to the implementation of the SentenceTransformerRerank and CohereRerank classes in the LlamaIndex codebase for more details. Here are the links to the source code:

Please let me know if you need further assistance or clarification.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

brucethemoose commented 8 months ago

Yeah I just wrote my own class. Good bot.

terilias commented 7 months ago

I encounter the same issue. I think that a TEI implementation for the reranker models will be useful. So maybe it's better to reopen the issue...

Bearsaerker commented 7 months ago

I'd love to see this as well!

dosubot[bot] commented 4 months ago

Hi, @brucethemoose,

I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. From what I understand, you opened this issue requesting a reranking class similar to SentenceTransformerRerank or CohereRerank for a Text Embeddings Inference server due to performance and scaling issues with the current classes. I provided a detailed response with an example of how to structure the new class, and you confirmed that you wrote your own class. There was also a suggestion to reopen the issue with support from other users.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation. If you have any further questions or need assistance, feel free to reach out.

terilias commented 3 months ago

@dosubot @brucethemoose Hello, as of my understanding there still no exist such a class in the newer version of LlamaIndex. We can see here in the postproseccing modules. Right?