Open vredesbyyrd opened 5 years ago
Hi @vredesbyyrd, thanks a lot for your feedback!
For your first issue: I looked at how angrysearch implements the indexing and it seems like they don't really have fuzzy finding (at least as it is defined in this project). Instead, a space denotes in angrysearch seems to split the query into phrases which can be present in the search result in either order.
The fuzzy searching, as defined in this project (which is also not actual fuzzy searching afaik), takes the query as a single phrase and looks for strings that contain all of the characters of the phrase in the same order, but not necessarily continuously. So in your case, since you searched for "fka\ twigs", there has to be a space somewhere between the occurrences of "fka" and "twigs". If you don't want that, you can search for "fkatwigs" instead.
Maybe it would be a smart thing to split the query into phrases as angrysearch does it, and then process those phrases differently. I'm also thinking about how we can improve the sort order, i.e. if the filename contains the whole query, it should be prioritized.
Regarding the performance concerns, I think we're good here. I think if we end up creating a useful full path search, I'll merge it into master and keep a build option around to disable the changes.
For your first issue: I looked at how angrysearch implements the indexing and it seems like they don't really have fuzzy finding (at least as it is defined in this project). Instead, a space denotes in angrysearch seems to split the query into phrases which can be present in the search result in either order.
The fuzzy searching, as defined in this project (which is also not actual fuzzy searching afaik), takes the query as a single phrase and looks for strings that contain all of the characters of the phrase in the same order, but not necessarily continuously. So in your case, since you searched for "fka\ twigs", there has to be a space somewhere between the occurrences of "fka" and "twigs". If you don't want that, you can search for "fkatwigs" instead.
Thank you for detailing that! The results I was getting or not getting makes sense now...in hindsight I should have recognized why. Using my example before with the query "fkatwigs" does indeed find what one would expect. I am admittedly pretty naive when it comes to the the inner workings of search tools. Like breadth-first vs depth-first algorithms, how different matching methods really work and differ.
Maybe it would be a smart thing to split the query into phrases as angrysearch does it, and then process those phrases differently. I'm also thinking about how we can improve the sort order, i.e. if the filename contains the whole query, it should be prioritized.
From a user point of view, I would agree splitting the query into phrases in a similar fashion as angrysearch would be a nice addition, its just a bit more user friendly imo. On prioritizing complete matches, agreed.
Regarding the performance concerns, I think we're good here. I think if we end up creating a useful full path search, I'll merge it into master and keep a build option around to disable the changes.
Right on 👍
Hi. After a fair amount of testing I have some observations regarding full path fuzzy finding and fuzzy search in general. The main search tool I was using before this was angrysearch , which also had the stated goal of being an Everything for linux. Angrysearch came very close to that goal, minus the always up-to-date database, which is obviously a big downside. But its filtering logic is as good as Everything's, IMO. So any comparisons I make will be with angrysearch/everything. To me, a good fuzzy finder means always being able to filter down to what you need from lazy queries. I'll show what I mean by lazy queries below.
First, this example does not necessarily concern fullpath finding, only filename fuzzy searching. Lets say I am looking for all photographs of the artist 'fka twigs'_ , but I do not know where they are located or how the basename is formatted, so I query just her name.
gosearch -c -r -fp fka\ twigs
.I only see a couple
cover.png
album covers in the results which I know are not what I am looking for. Making the same query with angrysearch with comparable _fuzzy_wholepath settings finds the sought after photographs (highlighted below) in addition to all strings in the database containingfka twigs
In my mind thats the benefit of whole path fuzzy finding. It allows the user to make broad based 'lazy' searches and 99% of the time successfully filter down to what they are looking for. E.g, If I was looking for all albums by artist
fka_twigs
the same query in full_path mode would find the results I am looking for.Regarding finding the filename
fka-twigs_01
from the queryfka\ twigs
, is this already possible with gosearch + using regex filters via the config? A bug in the fuzzy finding? I tried a few different regex patterns but could not get the results I wanted.EDIT: I guess the important question here is what is the best way to avoid troublesome meta-characters. I think angrysearch removes them from the database altogether. Which seems a bit overkill perhaps? E.g, You would not be able to query
C++
. Maybe a flag to replace the most common troublesome characters [ hyphen, underscore, period ] with spaces would be a good middle road. Just some thoughts. Ill take a look how Everything handles it.Performance concerns:
I did not do any real benchmarks between the master branch and _fuzzypath branch, but on my 2014 i5 laptop there is no perceivable performance difference. This is just some anecdotal evidence, but the gosearchServer process uses nearly identical memory on both branches. Not sure how good of a measure that is though. Initial index time was also essentially the same. IMHO, having the option of full-path query in the main search tool I use is very important and would have to come at a large performance cost to consider not implementing it.
Cheers.