Open vincetrep opened 1 year ago
This is not really a bug, as the Python client only allows lists and currently does not implement numpy arrays. Because no proper type checking is implemented, the Python client simply writes the vector in the request as [0 0] with numpy arrays, instead of [0, 0] with lists. In other words, there are no commas. This works with floats for some reason, but it really should not work at all.
Thank you for your contribution to Weaviate. This issue has not received any activity in a while and has therefore been marked as stale. Stale issues will eventually be autoclosed. This does not mean that we are ruling out to work on this issue, but it most likely has not been prioritized high enough in the last months. If you believe that this issue should remain open, please leave a short reply. This lets us know that the issue is not abandoned and acts as a reminder for our team to consider prioritizing this again. Please also consider if you can make a contribution to help with the solution of this issue. If you are willing to contribute, but don't know where to start, please leave a quick message and we'll try to help you. Thank you, The Weaviate Team
Description:
I've done both vector searches and hybrid searches on a weaviate Class. When there are 0 value vectors within a vector e..g [0. 0. 0.] the hybrid search doesn't interpretate the raw value correctly.
In a vector search, it works properly.
Issue: This is the error that comes up upon doing a hybrid search with 0. values in a vector:
{'errors': [{'locations': [{'column': 65, 'line': 1}], 'message': 'Syntax Error GraphQL request (1:65) Invalid number, expected digit but got: " ".\n\n1: {Get{SmallIngredientDoc(hybrid:{query: "dummy text", vector: [0. 0. 0. ... 0. 0. 0.], alpha: 0.5}){base_food_nametext, titletext, title__embedding _additional {distance }}}}\n ^\n', 'path': None}]}
When I apply a regex to replace the 0. values by 0.0, then the hybrid search runs properly.
built_query = re.sub(r'(0.|0.)(\s|])','0.0\2',built_query)
I've put sample code to test out the hybrid search.
In the code sample, using random values to the vector works. If you use np.zeros(5) instead without string manipulation, it won't work. With string manipulation, it works.
`import re
class SampleDoc(BaseDoc): title: TextDoc embedding: Optional[AnyEmbedding] = Field(is_embedding=True)
dbconfig = WeaviateDocumentIndex.DBConfig( host="http://localhost:9999" ) # Replace with your endpoint)
store = WeaviateDocumentIndexSampleDoc
create a sample doc
sample_doc = SampleDoc (title = 'test text', embedding = np.random.rand(5))
Try to query for that doc with a hybrid search
query = client.query.get("SampleDoc", ["title__text"]).with_additional("distance").with_hybrid("dummy text", alpha=0.5,vector=np.random.rand(5)) built_query = query.build() print(built_query) built_query = re.sub(r'(0.|0.)(\s|])','0.0\2',built_query) print(built_query)
client.query.raw(built_query)
result = store.execute_query(built_query) print(result)`