mikegoatly / lifti

A lightweight full text indexer for .NET
MIT License
184 stars 9 forks source link

Score boosting #72

Closed mikegoatly closed 11 months ago

mikegoatly commented 1 year ago

This is a brain dump of some thoughts around how LIFTI could make its scoring system more flexible. This issue will be updated as the thinking evolves.

Extend the object tokenization builder to provide:

.WithObjectTokenization(o =>
  o.WithField("Name", c => c.Name, scoreBoost: 3) // Boost any scores from this field by x3
  o.WithScoreBoosting(b =>
    b.Freshness(item => item.LastModified, multiplier: 3) // Boost results on a scale between oldest and newest. The value returned by the delegate must be a DateTime. E.g. if using DateTimeOffset, dto.DateTimeUtc can be used.
      .Magnitude(item => item.Rating, multiplier: 3) // Boost results on a scale based on a numeric value. The value returned by the delegate must be a double. 

Questions: Need to think about score boosting dynamic fields.

We could also add dynamic score boosting to the LIFTI query syntax (similar to Lucene):

term^3

Where if the term matches, it's score is boosted by that amount.

bielu commented 1 year ago

HI @mikegoatly I was just creating ticket about need of boosting, but found this ticket and I think it would be great to have field boosting! It deffo should support boosting dynamic fields and I think usage of similar to lucene syntax would be best option as it makes easier to switch between of them :)

mikegoatly commented 1 year ago

To support freshness and magnitude boosting, we'll need to track the maximum and minimum values for each indexed item type, as well as the value for each item. This will also need storing in the serialized data and rehydrated into new indexes.

The additional score boosts should then be calculatable at scoring time.

mikegoatly commented 1 year ago

The first bit of work for this has been merged into the v6.0.0 branch. This will allow for fields to have a score boost applied to them, like this:

var index = new FullTextIndexBuilder<string>()
    .WithObjectTokenization<TestObject3>(
        o => o.WithKey(i => i.Id)
            .WithField("Text", i => i.Text, scoreBoost: 10D)
            .WithField("MultiText", i => i.MultiText))
    .Build();

Next up, application of score boosting for freshness and dynamic boosting at query time.

Edit: Just for personal future reference when collating changes for documents, this change does also support dynamic score boosting for dynamic fields, e.g.:

var index = new FullTextIndexBuilder<string>()
    .WithObjectTokenization<TestObject3>(
        o => o.WithKey(i => i.Id)
            .WithDynamicFields("Tags", c => c.TagDictionary, "Tag_", scoreBoost: 3))
    .Build();

Although all fields that are discovered by that dynamic field provider will all receive the same score boost.

mikegoatly commented 11 months ago

Added score boosting to the query syntax in the v6.0.0 branch, allowing for queries such as text^3. This can be applied to exact, fuzzy and wildcard matches.

mikegoatly commented 11 months ago

Feature complete - ready to be released with v6