Open nicolas-werner opened 6 days ago
Hello @nicolas-werner! I'm here to help you with any bugs, questions, or contributions you may have. Let's work together to solve the issue you've encountered.
To fix the PGVectorStore
to correctly handle string metadata that resembles numbers without causing a DataError: invalid input syntax for type double precision
when filtering, you need to ensure that the metadata is treated as text in the SQL queries. This can be done by explicitly casting the metadata fields to text in the filter conditions.
Here is a modified version of the query
method to handle this:
def query(self, query: VectorStoreQuery, **kwargs: Any) -> VectorStoreQueryResult:
def cast_to_text_filter(filters):
return {key: f"CAST({value} AS TEXT)" for key, value in filters.items()}
results = self._client.search(
embedding=query.query_embedding,
top_k=query.similarity_top_k,
filter=(
meta_contains(
cast_to_text_filter({pair.key: pair.value for pair in query.filters.legacy_filters()})
)
if query.filters is not None
else None
),
)
nodes = [
metadata_dict_to_node(record.meta, text=record.text)
for record, _ in results
]
return VectorStoreQueryResult(
nodes=nodes,
similarities=[score for _, score in results],
ids=[str(record.id) for record, _ in results],
)
In this code, the cast_to_text_filter
function ensures that all metadata values are cast to text, preventing the DataError
caused by invalid input syntax for type double precision [1].
To continue talking to Dosu, mention @dosu.
Couldn't this issue be fixed by catching DataError
in the except
?
Bug Description
The PGVectorStore is incorrectly handling string metadata that resembles numbers, causing a
DataError: invalid input syntax for type double precision
when filtering. This occurs presumably in the_build_filter_clause:458
method, which attempts to cast string values to floats if they can be interpreted as numbers.Workaround As a temporary workaround, you can add non-numeric text to the file names (e.g., "file_2024_123" instead of "2024_123"). This prevents the value from being interpreted as a number and avoids the casting issue.
Version
0.11.8
Steps to Reproduce
Relevant Logs/Tracbacks