Open LazerCube opened 1 year ago
There's a PR https://github.com/weaviate/weaviate/pull/3560 that's fixing a bug with the multi2vec-clip
vectoriser that may or may not influence this issue. As part of the CI, preview images are published to Docker hub.
If you're willing, could you test with the semitechnologies/weaviate:preview-ensure-name-is-passed-correctly-in-function-647aadc
image to see if your issue is resolved by the bugfix? Cheers!
I've just tested with the semitechnologies/weaviate:preview-ensure-name-is-passed-correctly-in-function-647aadc
image, but unfortunately, the issue sadly persists.
To narrow down the potential problems, would you be able to experiment with different weights for the imageFields
and textFields
in the schema definition?
Specifically, if you weight it to be 100% image and 0% text and 0% image and 100% text then observe the results. This will help identify if it's a problem with the models themselves or with the heuristic that the vectorisation process uses above the models. Also, to be clear, are you running NearText
searches?
Yes, I am using NearText
searches. I tested again by adjusting the field weights: first to 100% for images and then to 100% for text. But regardless of these adjustments, there was no variation in the search results. I even deleted and recreated the schema before re-importing my test data each time, but even the certainty and distance values in the results remained consistent across the two configurations.
However, I also tried using multi2vec-bind
and that does seem to work as expected. So I might just transition to that instead.
Thanks for experimenting with your configuration and providing the great feedback. To me, this signifies that the query provided in NearText
is not being vectorised into the same vector space as the objects. As mentioned, there are heuristics involved in multi2vec-clip
so this could very well be the source of unexpected behviour. I will therefore leave this issue open since I feel it is a bug
As to using multi2vec-bind
, do be aware of its potentially restrictive non-open-source license!
Summary
When using text fields in a
multi2vec-clip
class, I've observed a large decrease in search accuracy. Specifically, searching for objects with exact-matching descriptions aren't always ranked at the top or are sometimes omitted entirely. For example, an image of an airplane with a description of "An airplane flying through a blue sky" isn't returned when searching for "airplane". But if the description is empty it works as expected.I seem to get the same issue using any of the text search methods, so I'm not really sure what's going on.
I might just be doing something incorrectly so I've created a repository to demo the issue in more detail here