Open SevenMpp opened 2 months ago
Hi @SevenMpp I tried to reproduce your case, but it worked correctly to me in latest and in v1.9.5
In my MRE, I ran the following requests in order
PUT collections/sha
{
"vectors":{}
}
PUT collections/sha/index
{
"field_name": "content",
"field_schema": {
"type": "text",
"tokenizer": "word",
"min_token_len": 2,
"max_token_len": 15,
"lowercase": true
}
}
PUT collections/sha/points
{
"points": [
{ "id": 1, "vectors": {}, "payload": {"content": "London is very rainy" } },
{ "id": 2, "vectors": {}, "payload": {"content": "Mexico City is very crowded" } },
{ "id": 3, "vectors": {}, "payload": {"content": "London is in the UK" } },
{ "id": 4, "vectors": {}, "payload": {"content": "Berlin is the capital of techno music" } }
]
}
POST collections/sha/points/query
{
"filter": {"must": { "key": "content", "match": {"text": "is the"}}},
"with_payload": true
}
Result:
{
"result": [
{
"id": 3,
"version": 0,
"score": 0,
"payload": {
"content": "London is in the UK"
},
"vector": null,
"order_value": null
},
{
"id": 4,
"version": 0,
"score": 0,
"payload": {
"content": "Berlin is the capital of techno music"
},
"vector": null,
"order_value": null
}
],
"status": "ok",
"time": 0.003149917
}
Could you please provide a reproducible example?
Thank you very much for your reply. Expectations are re-vectors, full-text match, and filter conditions. The hope is that full-text matching can be ranked at the top, but currently only "full text match" content is matched. All non-matching content is filtered out. Can it be directly integrated and used?
PUT /collections/sha { "vectors": { "size": 3, "distance": "Cosine" } }
PUT collections/sha/index { "field_name": "content", "field_schema": { "type": "text", "tokenizer": "word", "min_token_len": 2, "max_token_len": 15, "lowercase": true } }
PUT collections/sha/points { "points": [{ "id": 1, "vectors": [-0.21554970741271973, 0.16919100284576416, -0.7354516983032227 ], "payload": { "content": "Goal" } }, { "id": 2, "vectors": [1.0798169374465942, -0.24099303781986237, -0.005861682817339897], "payload": { "content": "Summarize the text given and extract keywords from the summary. Identify the input text's language(For example\n\nChinese, English)\t\tSummarize the text" } }, { "id": 3, "vectors": [0.7270970940589905, -0.43327197432518005, -0.6609529256820679], "payload": { "content": "Extract keywords from the summary (Note: do not extract keywords directly from the original text). Keywords should represent the main topics or themes discussed in the text." } }, { "id": 4, "vectors": [-0.9851164817810059, 0.5659856200218201, -0.2668682336807251], "payload": { "content": ". Ignore common stop words (e.g., the, is, and, of). Focus on nouns, noun phrases, and verbs that carry the main ideas" } } ] }
POST /collections/sha/points/query {
"vector": [1.0798169374465942, -0.24099303781986237, -0.005861682817339897], "filter":{ "should": { "key": "content", "match": { "text": "common stop words" } } }, "limit": 5, "with_payload":true
}
result: { "result": [ { "id": 4, "version": 0, "score": 0, "payload": { "content": ". Ignore common stop words (e.g., the, is, and, of). Focus on nouns, noun phrases, and verbs that carry the main ideas" }, "vector": null, "order_value": null } ], "status": "ok", "time": 0.000490212 }
but currently only "full text match" content is matched. All non-matching content is filtered out. Can it be directly integrated and used?
If you are looking for hybrid search, I would recommend to start from here https://qdrant.tech/documentation/concepts/hybrid-queries/
Using full text match, the expectation is that the matching string can be hit and ranked at the top, but the current effect does not seem to have any effect. version: Qdrant v1.9.5
Current Behavior
Use the following command to observe that data containing strings is not recalled: POST /collections/sha/points/scroll { "must": [ { "key": "content", "match": { "text": "Focus" } } ], "limit": 2, "with_payload": true } or POST /collections/sha/points/search { "vector": [ -0.21554970741271973, 0.16919100284576416, -0.7354516983032227 ], "must": [ { "key": "content", "match": { "text": "common stop words" } } ], "limit": 4, "with_payload":true }
Steps to Reproduce
3.collection : scheme