sillsdev / languageforge-lexbox

Language Forge / Language Depot unification project
MIT License
7 stars 2 forks source link

Use fuzzy string matching for similar project name detection #1008

Open rmunn opened 1 month ago

rmunn commented 1 month ago

Describe the feature Use fuzzy string matching for project name matching (see #979).

Who is this feature for? When users create a new project, #979 uses simple string.Contains logic to look for projects with similar names. But Postgres has a built-in feature for fuzzy string matching using Levenshtein distance; you just have to enable the extension in your ModelBuilder configuration. We might want to improve the project-name matching by using fuzzy string matching to find similar projects.

There's even a version that makes the search efficient: you define a maximum distance, and as soon as Postgres determines that the actual distance would be larger, it stops calculating. So you can write FuzzyStringMatchLevenshteinLessEqual(needle, haystack, 5) <= 5 and get a list of similar strings without wasting too much time on dissimilar ones.

Pages affected Project creation page

rmunn commented 1 month ago

Implementation would be simple: just add EF.Functions.FuzzyStringMatchLevenshteinLessEqual(p.Name, input.ProjectName, 5) <= 5 to the ProjectsByNameAndOrg query.