Closed mattbarkerdev closed 7 years ago
Thanks!!! This looks good, I'll review this evening and merge it in. Just out of curiosity (I'm still trying to fully understand the search algorithm), am I right in saying that the longer the text being searched the more likely it is that the result will be near the top? For example, if a search term is found 3 characters in to a 10 character term, that will appear behind a result that is found as the 20th character in a 100 character term...?
Morning - yes, for now the weighting based on the index of the search string to the length of the word does have the (probably undesired) side effect of over weighting words that are longer. For example
alberto will have a hit result of 5 however Berto will also have a result of 5 also francobertoli would have a result of 10
Haven't quite thought my way around how to prevent it increasing when the word is long. Given I'm currently only name matching with it, it hasn't come up too frequently (yet!) I suppose I could artificially deflate the hit count based on distance away from the start as opposed to increasing it due to the proximity to it but then i suspect I might have to handle negative hit results which is liable to get messy!
I'll have a think and get back to you!
Have you thought about using Levenshtein distance? SearchExtensions does support this (blog post) but only as an in memory action. You could return all matching records then order by the smallest distance (i.e. changes required).
I have put a fair amount of effort in to making the Levenshtein search perform well so it is able to calculate the distance of over a million records in under a second.
It might not be right for your problem, but just a thought
that isn't a bad idea - I did see the implementation you put in. I'll try running some tests here and see what it comes up with! thanks for the idea!
Morning! I haven't forgotten about this yet I promise! Just wondered if the first merge has made it into the nuget build yet? thanks!
Hi @mattbarker1983
The current version on nuget does not include your PR. I was hoping to take the opportunity to refactor some bits here and there, but like you, there is not enough time. I'll pick it back up and see if I can get it out.
Cheers
I was also wondering on the outcome of the validity of the search method. Did you manage to look into using Levenschtein distance at all?
P.S. I have updated the `.ToRanked() method to take an enum. Unfortunately this means this PR will need tweaking, sorry
Yeah, time is a little pressed at the moment I must admit!
I did have a quick scan through the Levenschtein distance stuff. It looks like it could also work and yield the type of outcome that I'd expect. I was probably tired though as it looked like the implementation in the source compared one string to another rather than multiple to multiple. The usage currently is searching over three (maybe more in future) properties of an object against any number of incoming search phrases. Unless I'm missing something from last night, I couldn't see a way of combining them in the same way.
return regUsers .Search(s => s.FirstName, s => s.LastName, s => s.Title) .ContainingAll(SearchQueryItems) .ToLeftWeightedRanked() .OrderByDescending(r => r.Hits) .Select(r => r.Item);
Ah, ok. The multiple thing might not work straight off the bat. Ok, I'll create a new release from the master branch. Did you want me to wait for you to fix up this PR? (I broke it when I changed ToRanked()
to take an overload)
No worries! I should have some time this week to look at the PR. I'll grab the changes you made and fix it up. No hurry :) I'm sure we both have enough to be getting on with normally heh. I'll shout you when I've fixed it.
Cheers!
Hi @mattbarker1983
Something you mentioned a couple of posts ago was the inability to perform a Levenshtein search on more than one property and more than one search term. Well, it got me thinking and I've now added this as a feature to SearchExtensions. It can be used like this:
var result = context.Users.LevenshteinDistanceOf(x => x.FirstName, x => LastName)
.ComparedTo("Jim", "Fred");
It's not published to nuget just yet (I'm waiting to hit 10k downloads) but I wondered if this would be something that could be useful to you?
Hi @ninjanye
Thanks for this! Yeah I believe it will come in handy! Sorry I've gone a little dark of late, I'm getting married next week so I've got quite a bit on my plate at the moment! Looks like it would work for our purposes though!
Congratulations! Hope the day goes well. With 10k downloads fast approaching, I've completed the Levenschtein work above. If possible, I'd like to temporarily remove the left weighted search stuff and release the Levenschtein update. This will allow me more time to become clear about the Left weighted search use case.
I really hate to have to ask as I don't get a lot of pull requests, and you've put a lot of effort into your work, which I really appreciate.
Once the latest release goes out, I'll reinstate the left weighted with and do some research into his often it might be used.
Cheers John
Not a problem at all - it isn't quite ready and may in fact be redundant now if you're new work additions have the desired impact! on't worry about the time spent on it, I don't think its quite complete yet so I wouldn't worry!
hello again! Spotted an issue with the weighting expression the other day. When searching with multiple params, it was adding incorrect weighting if the searchphrase wasn't contained in the text. Added some new equality checks to the expression builders and its happy again. You only really need the EnumerableExpressionHelper.cs changes but i thought i'd send the tests too