Closed stealthin closed 12 months ago
If I test the following code:
var index = new FullTextIndexBuilder<string>()
.WithObjectTokenization<MyModel>(
itemOptions => itemOptions
.WithKey(b => b.Title)
.WithField("Title", b => b.Title, tokenOptions => tokenOptions.CaseInsensitive().AccentInsensitive().SplitOnPunctuation(false).IgnoreCharacters('=')))
.Build();
await index.AddRangeAsync(new List<MyModel>
{
new MyModel("1", "TEST=test3,TEST=othertest")
});
var tokenizer = index.GetTokenizerForField("Title");
var search = tokenizer.Normalize("TEST=test3");
var suggestions = GetSuggestions(search);
var results = index.Search(search);
Console.WriteLine(results.Count());
IEnumerable<string> GetSuggestions(string input)
{
using var navigator = index.CreateNavigator();
navigator.Process(input.AsSpan());
return navigator.EnumerateIndexedTokens().ToList();
}
public sealed record MyModel(string Key, string Title);
I get TESTTEST3,TESTOTHERTEST
from suggestions
, which is not what I'd like. It should be TEST=test3,TEST=othertest
instead.
Hi!
The =
in a LIFTI query is used to restrict a search to a specific field. At the moment there's no syntax to escape that. If you don't need the full LIFTI query syntax, then this should work for you:
var index = new FullTextIndexBuilder<string>()
+ .WithSimpleQueryParser()
.WithObjectTokenization<MyModel>(
itemOptions => itemOptions
.WithKey(b => b.Title)
.WithField("Title", b => b.Title, tokenOptions => tokenOptions.CaseInsensitive().AccentInsensitive()))
.Build();
await index.AddRangeAsync(new List<MyModel>
{
new MyModel("1", "MyText=test3")
});
var results = index.Search("MyText=test3");
Console.WriteLine(results.Count());
public sealed record MyModel(string Key, string Title);
I see what you were trying to do with the .IgnoreCharacters('=')
part of the second example, but IgnoreCharacters
actually strips a set of characters from the input as if they were never there, so that's definitely not what you want :)
Crystal clear! Thank you for your quick answer!
No problem - glad I could help. I just had another thought - you could have also worked around this using a manually constructed query:
var index = new FullTextIndexBuilder<string>()
.WithObjectTokenization<MyModel>(
itemOptions => itemOptions
.WithKey(b => b.Title)
.WithField("Title", b => b.Title, tokenOptions => tokenOptions.CaseInsensitive().AccentInsensitive()))
.Build();
await index.AddRangeAsync(new List<MyModel>
{
new MyModel("1", "MyText=test3")
});
+ var normalizedSearchText = index.GetTokenizerForField("Title").Normalize("MyText=test3");
+ var query = new Query(new ExactWordQueryPart(normalizedSearchText));
+ var results = index.Search(query);
Console.WriteLine(results.Count());
That way you're bypassing the query parser completely and are being explicit about the fact you want the =
in the word.
That makes sense! This is what I ended up doing 😄
Hello,
Firstable, thank you for this amazing library.
I tried to index and search for words, but it seems I can't get any result when I execute the following code:
Would it be feasible to escape the
=
character that is part of the text itself? (I don't want the search text to be changed)Many thanks!