vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.6k stars 586 forks source link

Special tokens support #12508

Open jeffkayne opened 4 years ago

jeffkayne commented 4 years ago

As mentioned on gitter, I am trying to query data (with a yql), to find all documents that match the string "&Free". The exact query I'm executing is: "SELECT text_raw FROM sources post WHERE text_raw CONTAINS '&free';" This query only filters documents that contain the string "free", not "&free".

The specific use-case for this is that I am looking for brand names within documents.

Vespa version: 7.165.5

bratseth commented 4 years ago

Is the use case here that "&free" is a proper name that you need to match?

jeffkayne commented 4 years ago

Yes, correct

jobergum commented 4 years ago

Please see match modes https://docs.vespa.ai/documentation/reference/search-definitions-reference.html#match. Default for string fields is text which implies tokenization where punctation characters are removed.

bratseth commented 4 years ago

I think this is to make "special tokens" work independently of the linguistics plugin then. I'll try to find some time somewhere to do that.

jeffkayne commented 4 years ago

Yes that would be great if we could use an escape, for example "\x##" to get an exact match of the string. Cheers!

bratseth commented 4 years ago

That's no problem but doesn't solve the problem because the engine need to decide at wqrite time what tokens to index.

baldersheim commented 12 months ago

soon timed out