Open machete-michael opened 2 years ago
Thanks @machete-michael for the report.
More info on this issue.
Here is a request made on a default dynamic mapping (see docs example) that shows the same error:
curl -XGET http://localhost:7280/api/v1/my_dynamic_index/search\?query\=cart.product_description:cherry-pi
{
"InvalidQuery": "The field '_dynamic' does not have positions indexed"
}%
Without the -
character, everything works well. Somehow, adding -
is triggering a phrase query. But the cause can come from something totally different (like something happening in tantivy generate_literals_for_json_object
function. We need to investigate what's happening.
The query parser identify the string to search for correctly. The default tokenizer splits it into several tokens ([cherry, py]
) which triggers the phrase query.
Probably the right fix would be to emit an intersection query here, if position are not available instead of emitting a error.
@machete-michael sorry for the long silence. A new eye on this issue made me think that you may be interested in a uuid friendly tokenizer.
We have open an issue on this: https://github.com/quickwit-oss/quickwit/issues/1143
There is a PR that is almost mergeable here too: https://github.com/quickwit-oss/quickwit/pull/1598
Is this something you are interested in?
Hi @fmassot,
Thank you for looking into this issue.
UUID friendly tokenizer may just solve the issue with values with dashes and not the issues with the other delimiters.
In any case, I’ve move on to other solutions and am not waiting for a fix.
Please feel free to close the issue.
The query parser identify the string to search for correctly. The default tokenizer splits it into several tokens (
[cherry, py]
) which triggers the phrase query.Probably the right fix would be to emit an intersection query here, if position are not available instead of emitting a error.
Shouldn't we use the same tokenizer as set in the config for the field ("raw")
The PhraseQuery
issue would still persist for fields that are tokenized. I'm not sure about an intersection query, since it may silently return wrong results.
Hi here,
I've just spawned a fresh install of Quickwit 0.5 and I've configured a very simple index with no fields mapping (pure dynamic mode). I'm ingesting JSON logs from Vector. In that configuration, I cannot search anything with characters "-",",",".",SPACE .. I get the error : Invalid query: The field '_dynamic' does not have positions indexed" 100% of time. It does not depend on the field I'm searching on. Most the fields I've tried are supposed to be simple string fields.
Examples:
If I search a term without these special chars it works with no problem. This issue is quite problematic because almost all the searches I would like to do fail 🙁 I've tried to delete my index and restart from scratch with no success
My index configuration :
version: 0.5
index_id: suricata
doc_mapping:
mode: dynamic
indexing_settings:
commit_timeout_secs: 10
Did I miss something on the setup/configuration ?
The queries you are trying to run are so-called phrase queries (due to the quotation mark). They require to store the token positions to run... This is someting that is not enabled by default but you can enable it as follows.
version: 0.5
index_id: suricata
doc_mapping:
mode: dynamic
dynamic_mapping:
record: position # default to basic
indexing_settings:
commit_timeout_secs: 10
Thank you @fulmicoton ! I confirm it works perfectly !
I’m using dynamic mapping to ingest a JSON object with a field of an array of JSON objects. If the array element has a field with value that has -, _, #, etc. delimiters in it, e.g. a uuid, querying against this field will result in the error:
“SplitSearchError { error: \”Invalid query: The field ‘_dynamic’ does not have positions indexed\”…}”
Steps to reproduce (if applicable) Steps to reproduce the behavior:
Expected behavior I should get a matching document as a response
Configuration: Please provide:
quickwit --version
v0.3.1nightly