alternative algorithm - Githubissues

bewinsnw commented 1 year ago

Hey, read about your project on lobste.rs (I don't have an account there or I'd have commented in the thread). A while ago I made a drawing tool which made software architecture diagrams from text, as you type - for use as a whiteboard in zooms - and it attempted to guess icons for things you drew, picking from eg the aws (https://aws.amazon.com/architecture/icons/) and kubernetes sets. I wanted to avoid an explicit picker which breaks the flow of the diagramming discussion, so it just does a fuzzy find and picks the first result.

This produces results similar to your fuzzy find, you may be interested in the trick (since I see your code works entirely differently; mine was a quick hack in js). What my code did was take the search target you request and rank the results by the shortest occurrence of that sequence of characters. To do this, I just transform the target into a regexp (being lazy, and not writing a bunch of match code). For example, for word, it generates (w[^w]*?o[^o]*?r[^r]*?d); the w[^w]*? avoids captures like wwwword where there is a shorter subsequence containing the target within the match (this falls over a bit for strings with multiple matches; you can easily fix this by grabbing all matches and choosing the shortest). In my case, the input would only be a single token containing alphanumerics, underscores, and dashes; I split on the non-alphanumerics and match each subtoken separately; the score is the total size of all the matches in each string. (I had various other ways of breaking ties, specific to my use-case)

This isn't particularly efficient, and I'm well aware that abusing regexes like this is a hack, but the number of strings I had to match against wasn't very large, and the input alphabet was limited so I knew the generated regexes would be valid. It does, however, rank your 'makefile' example at the top, so I thought it might interest you.

bewinsnw commented 1 year ago

(closing, it's not really an issue, the note's here if you're interested, etc)

natecraddock commented 1 year ago

Thanks for sharing this! Sounds like a very useful way to assign icons in that context.

rank the results by the shortest occurrence of that sequence of characters

zf does this too, in addition to several other heuristics. Currently Zig doesn't have regexes in the standard library, and I want to keep deps to a minimum which is why I haven't used any regexes in zf.

natecraddock / zf

alternative algorithm #34