julianpoy / RecipeSage

A Collaborative Recipe Keeper, Meal Planner, and Shopping List Organizer in PWA form.
https://recipesage.com
612 stars 61 forks source link

Allow for Recipe title autocompletion #985

Closed escjf closed 1 year ago

escjf commented 1 year ago

When trying to find a Recipe based on words in its title other than the first one, the search is not successful (iOS and Firefox desktop). Furthermore, the minimum number of characters needed to start the search seems to vary (maybe it's based on full words or a percentage of it only?). Maybe implementing a full-text search on the frontend, once all recipes have been fetched, could be less costly and have smoother usability. EDIT: This would be a great improvement both in the My Recipes search and in the Meal Plan Recipes search

julianpoy commented 1 year ago

Would you mind refreshing your search index via the account settings page? I'm curious if that's the issue here.

escjf commented 1 year ago

Thanks for the quick reply! I did that, and I do feel there has been a change, however, the search results are still somewhat unpredictable: searching for "Tom", "Toma" doesn't yield anything containing those terms [ Tom yields an import I did of this ], it is until I search for "Tomato" that I see what I'm looking for. I'm not sure if you've tweaked some standard full-text search library on the backend side, and I have no idea about front end, but I thought doing it on the front end would be more reliable and less costly for server processing power.

julianpoy commented 1 year ago

Absolutely - hope the following sheds some light onto the behavior you've observed:

searching for "Tom", "Toma" doesn't yield anything containing those terms

The search is full-word with fuzziness of 1-2 characters. It isn't designed to autocomplete words at the moment. #197 was opened a while ago to add this functionality, but when I had approached it it was non-trivial. I could always take another poke here.

I'm not sure if you've tweaked some standard full-text search library on the backend side

RecipeSage uses ElasticSearch to power the search functionality. No tweaking is really possible outside of the queries/commands used to execute the search.

the front end would be more reliable and less costly for server processing power

I understand that it might seem that way, but it's far less reliable and requires indexing all of your recipes every time you load up the My Recipes page. Client side libraries are heavy, require a decent chunk of CPU power (won't run well on low-cost Android devices), and require loading all of the recipes onto the user's client device. Additionally, recipes are paginated since a very large percentage of users have more than 20k recipes in their collection.

leeoniya commented 1 year ago

take a look at https://github.com/leeoniya/uFuzzy#a-biased-appraisal-of-similar-work

julianpoy commented 1 year ago

Hi @leeoniya, thanks for the link.

Completely fair assessment that there are some libraries that are quite lightweight! I stand corrected.

I still see a number of issues, but perhaps you have some suggestions here:

  1. All of the above libraries require downloading the user's entire recipe collection to the device. I don't see a way around this, since any search library will need access to the full dataset rather than paginated API results.
  2. Compounding on point number 1, there are efforts to support searching across all of your friend's shared recipe collections (see here: #982).
  3. The search needs to support per-field weighting. I don't see a way to do this wight uFuzzy, though some others may work.
  4. RecipeSage used to use Lunr.js circa 2019, before switching to ElasticSearch. Users frequently complained about quality of search result ranking, trouble finding the results they wanted, etc. Since going with ElasticSearch this has been significantly better, but search is a finicky thing where it's somewhat hard to satisfy everyone.

If you have ideas for the above I'd really be happy to entertain a switch away from ElasticSearch if it truly delivers better results. I will admit that I do believe ElasticSearch (or any Lucene based engine) will tend to come out ahead in terms of accuracy and featureset in the long run, though.

julianpoy commented 1 year ago

Changed the title of this issue to more closely reflect the issue here. Full-text search is already in place, autocompletion is what's being requested here.

leeoniya commented 1 year ago
  1. All of the above libraries require downloading the user's entire recipe collection to the device. I don't see a way around this, since any search library will need access to the full dataset rather than paginated API results.
  2. Compounding on point number 1, there are efforts to support searching across all of your friend's shared recipe collections (see here: https://github.com/julianpoy/RecipeSage/pull/982).

yeah, i made the comment cause the issue description specifically said title search. if you need to search through a ton of complex/huge documents then certainly the calculus is different for what makes sense / is sane.

The search needs to support per-field weighting. I don't see a way to do this wight uFuzzy, though some others may work.

you can kinda get this with uFuzzy by searching each field as a separate haystack then ordering the results by field priority. you'll probably need to handle the intersection of all matches as well.

I will admit that I do believe ElasticSearch (or any Lucene based engine) will tend to come out ahead in terms of accuracy and featureset in the long run, though.

agreed. MiniSearch and FlexSearch are probably better options for frontend-only searches. they do tend to be init/mem intensive depending on the chosen tokenization strategy. but a real db fulltext search like sqlite's FTS5 would be the only solution that will scale well. https://typesense.org/ and https://www.meilisearch.com/ are also good.

nevertheless, i'm always curious how uFuzzy would perform on such datasets. do you have a 20k+ recipe dump that you can share publicly?

escjf commented 1 year ago

Thanks for the active discussion. I understand now the limitations and agree with the issue title change. Having this in mind, RecipeSage is really great 💪🏻

julianpoy commented 1 year ago

yeah, i made the comment cause the issue description specifically said title search. if you need to search through a ton of complex/huge documents then certainly the calculus is different for what makes sense / is sane.

No worries - I super appreciate the links and discussion.

agreed. MiniSearch and FlexSearch are probably better options for frontend-only searches. they do tend to be init/mem intensive depending on the chosen tokenization strategy. but a real db fulltext search like sqlite's FTS5 would be the only solution that will scale well. https://typesense.org/ and https://www.meilisearch.com/ are also good.

Looking into TypeSense's comparison, it honestly sounds fantastic as an alternative up until the RAM requirements. Just due to cost, I can't provide enough RAM for the entire dataset so disk+RAM is a requirement.

The other thing I have been playing around with is PGSync which makes it much easier to keep ElasticSearch up-to-date with RecipeSage's database, rather than having the application dual write. Looking through, I don't see an equivalent for TypeSense or Meilisearch, but perhaps you have thoughts there. I put up a PR for an initial implementation of PGSync here: #990.

nevertheless, i'm always curious how uFuzzy would perform on such datasets. do you have a 20k+ recipe dump that you can share publicly?

I don't, but one could build one by scraping with https://github.com/julianpoy/recipeclipper.

Thanks for the active discussion. I understand now the limitations and agree with the issue title change. Having this in mind, RecipeSage is really great 💪🏻

Thanks for opening the issue! Has got me thinking about how to improve the search overall.

julianpoy commented 1 year ago

@leeoniya Went ahead and tried out MeiliSearch here: https://github.com/julianpoy/RecipeSage/pull/995

After loading RecipeSage's dataset into MeiliSearch the data.ms file that MeiliSearch uses to store it's dataset is 75.1GB.

Given that ElasticSearch used nowhere near this amount (somewhere around 1/10th), I must say that MeiliSearch doesn't seem ready for prime-time. It's also consuming a very large amount of RAM. It is working, however.

I may hold off on merging MeiliSearch support at this time and refocus on improving the ElasticSearch implementation given these issues.