run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.74k stars 5.27k forks source link

[Feature Request]: NodePostprocessors could benefit from an async equivalent to postprocess_nodes #16911

Open kzbao opened 4 days ago

kzbao commented 4 days ago

Feature Description

Under batch circumstances where async methods are used (e.g. RetrieverQueryEngine.aquery()), it would be nice if _apply_node_postprocessors() or equivalent could call upon an async equivalent to postprocess_nodes to maintain the async calls throughout. This can be an issue when using a node postprocessor like LLMRerank or RankGPTRerank.

Additionally, RankGPTRerank and LLMRerank seem a bit redundant though someone added a apostprocess_nodes in https://github.com/run-llama/llama_index/pull/12620. Overall, it seems like there's an opportunity to define a cleaner interface and consolidate some of this code.

Reason

There are existing workarounds (implementation through low level API) though this issue gets at some opportunity to improve both in terms of performance and code maintainability.

Value of Feature

There is an improvement both in terms of performance and code maintainability.

Current Workaround

As a workaround, I created a new class extending RetrieverQueryEngine:

class AsyncRetrieverQueryEngine(RetrieverQueryEngine):
    async def _a_apply_node_postprocessors(
        self, nodes: List[NodeWithScore], query_bundle: QueryBundle
    ) -> List[NodeWithScore]:
        for node_postprocessor in self._node_postprocessors:
            nodes = await node_postprocessor.apostprocess_nodes(nodes, query_bundle)
        return nodes

    async def aretrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        nodes = await self._retriever.aretrieve(query_bundle)
        return await self._a_apply_node_postprocessors(nodes, query_bundle)

This allows me to use apostprocess_nodes which RankGPTRerank currently implements. Presumably some other postprocessors may want to support this as well.

Proposed Solution

We can add an apostprocess_nodes in BaseNodePostprocessor intended to be overwritten but which wraps the synchronous equivalent as a fallback. Query engines would also need to be updated to account for this in a way similar to the workaround above.