v6 documentation changes

Tracking documentation changes for v6

[x] Add new score boosting docs #72
[x] Update terminology used in documentation #92
[x] Update binary serialization format with document score boost metadata and object id #72
[x] Query syntax support for bracketed field names #76
[x] Query syntax for escaping characters #85
[x] Custom stemmers #82
[x] Query processing order and query part weightings #105

Release notes:

New features

Removed dependency on System.Collections.Immutable - only the netstandard2 version of the library now pulls in any dependencies. For net6 to net8, only built in types are used.
Score boosting!
- Score boosting as part of a query - grand^3 will boost the score of words matching "grand".
- Boosting of object fields - .WithField("Name", c => c.Name, scoreBoost: 1.5D).
- Boosting object scores based on a freshness date, e.g. the date it was last updated.
- Boosting object scores based on a magnitude value, e.g. a star rating.
Custom stemmers
Characters can now be escaped in LIFTI queries and field names in LIFTI queries can contain spaces.
Enhanced query execution logic

Performance increases

There was a significant amount of work done to improve performance and memory usage of building an index, index (de)serialization and searching.

All tests were run with Benchmark.NET: BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22631.3007) Intel Core i7-1065G7 CPU 1.30GHz, 1 CPU, 8 logical and 4 physical cores The results below are a comparison of the previous v5 version of LIFTI against the code in the v6.0.0 branch, running on .NET 8.

Index construction

Populating an index with 200 Wikipedia entries in a single batch

v5 Mean (μs)	v5 Allocated (KB)	v6 Mean (μs)	v6 Allocated (KB)
1,134.2	567,623.8	952.6	286,617.6

Populating each of the 200 Wikipedia entries one at a time (i.e. a new snapshot created after each document)

v5 Mean (μs)	v5 Allocated (KB)	v6 Mean (μs)	v6 Allocated (KB)
4,284.4	1,370,649.9	1,212.4	613,540.2

Searching

Lots of individual optimisations including:

Merge sorting results during unions and intersections for queries containing more than one part
Optimised collection of effected results during wildcard and fuzzy match query parts
Early application of field filters when matching results
Weighting of query parts to analyse optimal execution order so that documents can be eliminated from collection in other parts of the query.

make for some nice gains for various query types.

Query	v5 Mean (μs)	v5 Allocated (KB)	v6 Mean (μs)	v6 Allocated (KB)
"also has a"	169.74	379.19	52.71	122.97
(confiscation & th*) \| "and they"	1,203.69	1,557.29	105.23	185.02
*	193,333.07	103,612.99	62,298.80	13,152.30
?and ?they ?also	1,725.66	1,658.12	439.60	243.45
and	they	417.70	819.98	104.23	218.21
and ~ they	132.89	294.22	42.20	95.61
and ~10> they	132.64	297.67	43.34	97.04
and > they	214.03	455.75	106.16	169.17
and they also	283.82	565.34	56.02	109.51
co*on	445.27	798.77	180.04	263.47
con??*	2.21	2.30	1.96	1.97
confiscation	4.03	2.70	3.66	2.29
th*	2,277.00	2,914.76	569.76	412.60
Title=?great	416.08	399.17	108.86	34.50

Deprecated:

ItemMetadata.Item/DocumentMetadata.Item -> use Key property IFullTextIndex.Items -> use Metadata property FullTextIndexBuilder.WithDuplicateItemBehavior -> use WithDuplicateKeyBehavior method IndexOptions.DuplicateItemBehavior -> use DuplicateKeyBehavior property ScoredToken.ItemId -> use DocumentId property QueryTokenMatch.ItemId -> use DocumentId property ItemMetadata.Count -> IndexMetadata.DocumentCount ItemMetadata.GetMetadata -> IndexMetadata.GetDocumentMetadata

Technically breaking

IdPool and IIdPool are now internal - These weren't really exposed before anyway Removed interface IItemMetadata - just using DocumentMetadata going forwards QueryContext no longer has ApplyTo method IIndexNavigator: added Snapshot property IIndexNavigator: added overloads for GetExactMatches and GetExactAndChildMatches that allow for the current QueryContext to be passed in so unnecessary results are not collected. IIndexNavigator: new additional methods AddExactMatches and AddExactAndChildMatches that allow you to efficiently collect matches using a DocumentMatchCollector before converting it to an IntermediateQueryResult. IQueryPart now has double CalculateWeighting(Func<IIndexNavigator> navigatorCreator) method to help the query processing logic evaluate the most efficient order of execution. TItem generic type parameter name has been renamed to TObject. All query part types are now sealed New method IIndexNavigator.ExactMatchCount() IntermediateQueryResult constructors are no longer public Index serialization interfaces have been reworked. This shouldn't affect anyone because it was technically impossible to write your own serializers based upon them due to a lack of publicly accessible methods for rehydrating an index. IIndexNavigatorBookmark now implements IDisposable - you don't technically have to dispose it, but doing so will return it to a pool and allow it to be reused.

Querying changes

ScoredFieldMatch is now quite different and no longer publicly constructable. The only place you would have encountered this is in a custom scorer, and that's no longer necessary.

Several types that are only likely to have been used internally are gone:

FieldMatch
QueryTokenMatch
CompositeTokenMatchLocation
SingleTokenMatchLocation
ITokenLocationMatch
TokenLocationMatch

Breaking

DuplicateItemBehavior enum -> renamed to DuplicateKeyBehavior DuplicateItemBehavior.ReplaceItem -> use DuplicateKeyBehavior.Replace instead IQueryContext -> Just use concrete QueryContext this affects IQueryPart.Evaluate as it now takes QueryContext IIndexNodeFactory.CreateNode now takes concrete types ChildNodeMap and DocumentTokenMatchMap instead of ImmutableDictionary and ImmutableList respectively. A maximum of 31 different object types can now be configured against a single FullTextIndexBuilder (i.e. 31 distinct calls to WithObjectTokenization) - if anyone is actually indexing more that 31 object types, I'd be very interested to understand your scenario!

The rest of these will only affect you if you are explicitly referencing the type names in your code:

ItemPhrases -> renamed to DocumentPhrases ItemMetadata -> renamed to DocumentMetadata IItemStore -> renamed to IIndexMetadata

mikegoatly / lifti