Open theta-lin opened 3 weeks ago
The QueryPipeline
only uses the query_str
of the query transform output QueryBundle
because the QueryTransformComponent
is designed to return only the query_str
from the transformed query bundle. This is evident in the _run_component
method of the QueryTransformComponent
class, which explicitly returns a dictionary containing only the query_str
:
def _run_component(self, **kwargs: Any) -> Any:
"""Run component."""
output = self._query_transform.run(
kwargs["query_str"],
metadata=kwargs["metadata"],
)
return {"query_str": output.query_str}
This method does not include the custom_embedding_strs
or any other fields from the QueryBundle
, effectively ignoring them. This design choice ensures that only the transformed query string is passed along the pipeline, simplifying the handling of query transformations [1][2].
Additionally, in the context of using HyDEQueryTransform
, the transformed query string is used for embedding lookup and summarization. The HyDEQueryTransform
generates a hypothetical document to improve the embedding quality, but the pipeline still primarily relies on the query_str
for the final output. This approach helps in maintaining consistency and simplicity in the query processing pipeline [3].
To continue talking to Dosu, mention @dosu.
@dosu do you suggest a solution to use HyDE as a query pipeline step?
@ootkin Currently, as a workaround, I wrap around my retriever with a TransformRetriever
. Then, you can just use the transform retriever in the pipeline.
retriever = TransformRetriever(
retriever=retriever,
query_transform=HyDEQueryTransform(),
)
If you want to use HyDE before passing it to a query engine instead of a retriever, then you can also try TransformQueryEngine
.
Bug Description
When using a query transform as a component of a
QueryPipeline
, it effectively does not do anything as the output of such a component is simply the originalquery_str
.Specifically, I was using
HyDEQueryTransform
as a part of aQueryPipeline
. According to https://github.com/run-llama/llama_index/blob/e4ff32cdedd687c361ec084f0a05859b27318708/llama-index-core/llama_index/core/indices/query/query_transform/base.py#L152-L163 AQueryBundle
with custom embedding strings would be its output.Also, according to https://docs.llamaindex.ai/en/latest/module_guides/querying/pipeline/module_usage/#query-transforms the output of a query transform in a query pipeline is indeed
query_str
, but this design would effectively drop the custom embedding strings attached to the outputQueryBundle
.Version
0.10.38
Steps to Reproduce
Specifically using
HyDEQueryTransform
, just run the following script with an LLM configured:Relevant Logs/Tracbacks
No response