weaviate / weaviate

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​.
https://weaviate.io/developers/weaviate/
BSD 3-Clause "New" or "Revised" License
10.35k stars 713 forks source link

Issue with hybrid search requests in python client #3023

Open vincetrep opened 1 year ago

vincetrep commented 1 year ago

Description:

I've done both vector searches and hybrid searches on a weaviate Class. When there are 0 value vectors within a vector e..g [0. 0. 0.] the hybrid search doesn't interpretate the raw value correctly.

In a vector search, it works properly.

Issue: This is the error that comes up upon doing a hybrid search with 0. values in a vector:

{'errors': [{'locations': [{'column': 65, 'line': 1}], 'message': 'Syntax Error GraphQL request (1:65) Invalid number, expected digit but got: " ".\n\n1: {Get{SmallIngredientDoc(hybrid:{query: "dummy text", vector: [0. 0. 0. ... 0. 0. 0.], alpha: 0.5}){base_food_nametext, titletext, title__embedding _additional {distance }}}}\n ^\n', 'path': None}]}

When I apply a regex to replace the 0. values by 0.0, then the hybrid search runs properly.

built_query = re.sub(r'(0.|0.)(\s|])','0.0\2',built_query)

I've put sample code to test out the hybrid search.

In the code sample, using random values to the vector works. If you use np.zeros(5) instead without string manipulation, it won't work. With string manipulation, it works.

`import re

class SampleDoc(BaseDoc): title: TextDoc embedding: Optional[AnyEmbedding] = Field(is_embedding=True)

dbconfig = WeaviateDocumentIndex.DBConfig( host="http://localhost:9999" ) # Replace with your endpoint)

store = WeaviateDocumentIndexSampleDoc

create a sample doc

sample_doc = SampleDoc (title = 'test text', embedding = np.random.rand(5))

Try to query for that doc with a hybrid search

query = client.query.get("SampleDoc", ["title__text"]).with_additional("distance").with_hybrid("dummy text", alpha=0.5,vector=np.random.rand(5)) built_query = query.build() print(built_query) built_query = re.sub(r'(0.|0.)(\s|])','0.0\2',built_query) print(built_query)

client.query.raw(built_query)

result = store.execute_query(built_query) print(result)`

nilskulawiak commented 11 months ago

This is not really a bug, as the Python client only allows lists and currently does not implement numpy arrays. Because no proper type checking is implemented, the Python client simply writes the vector in the request as [0 0] with numpy arrays, instead of [0, 0] with lists. In other words, there are no commas. This works with floats for some reason, but it really should not work at all.

stale[bot] commented 9 months ago

Thank you for your contribution to Weaviate. This issue has not received any activity in a while and has therefore been marked as stale. Stale issues will eventually be autoclosed. This does not mean that we are ruling out to work on this issue, but it most likely has not been prioritized high enough in the last months. If you believe that this issue should remain open, please leave a short reply. This lets us know that the issue is not abandoned and acts as a reminder for our team to consider prioritizing this again. Please also consider if you can make a contribution to help with the solution of this issue. If you are willing to contribute, but don't know where to start, please leave a quick message and we'll try to help you. Thank you, The Weaviate Team