Index attributes in an inverted index

Proposal:

Currently, when you add a document like:

{
"int_attribute": 123,
"float_attribute": 1.23,
...
"ft_field" "abc"
}

only abc goes into the inverted index, and you can find it using match('abc'). You can't find this document with match('123') or match('1.23').

It would be cool if Manticore could do it.

As discussed on the dev call of Jul 26, 2024, what we can do is:

Index all attributes as MAGIC_@attr_value, similar to how we index exact_term as MAGIC_=token_value.
This way, the indexing pipeline could see the token comes from the attributes as it starts with MAGIC and will not apply stemming, will not generate infixes, will not create hitlists, or will not split tokens, but use the whole field as a value - only dictionary entries and doclists will be added into index.
All indexed attributes will be stored in a separate part of the dictionary and will not mix or participate in the regular search, similar to exact_terms. Currently, the dictionary looks like [regular_tokens, …, exact_tokens, …], but it will be [regular_tokens, …, exact_tokens, …, attributes, …].
During a search with the special syntax, we will expand the query as we do now for expanded keywords, e.g., "testing 2" => "test|=testing 2|=2", but with the new feature, it will be "testing 2" => "test|=testing|@testing 2|=2|@2".
The only issue is to ensure the feature will not mix with the old syntax/style of how it works now. For HTTP, it could be easy to ensure the query has only query_string without any filters, but for SphinxQL, it's not clear.
It will also work only for whole tokens without any syntax, complex operators, or wildcards.

The other issues we'd have to think through are:

Conflict with in-place updates: you update an attribute, but the value in the inverted index won't be updated. This may be confusing.
The special attribute values in the inverted index can affect full-text ranking.
The special values shouldn't be stored in the docstore.

Checklist:

^{To be completed by the assignee. Check off tasks that have been completed or are not applicable.}

- [ ] Implementation completed - [ ] Tests developed - [ ] Documentation updated - [ ] Documentation reviewed - [ ] Changelog updated - [x] OpenAPI YAML updated and issue created to rebuild clients

manticoresoftware / manticoresearch

Index attributes in an inverted index #2449

Proposal:

Checklist: