raycast / extensions

Everything you need to extend Raycast.
https://developers.raycast.com
MIT License
5.09k stars 2.81k forks source link

[Browser Bookmarks] Fuzzy search could be improved #10584

Open lovrobuday opened 5 months ago

lovrobuday commented 5 months ago

Extension

https://www.raycast.com/raycast/browser-bookmarks

Description

Love this thing, but the more bookmarks I added the worse the search is. I named my bookmarks like "work live" but when I search "work live" that bookmark is 5th on the list

Who will benefit from this feature?

search based on bookmarks name rather than url, better search

Anything else?

No response

raycastbot commented 5 months ago

Thank you for opening this issue!

🔔 @sasivarnan @ahpatel @danulqua @chupi33 @tleo19 @jum8ys you might want to have a look.

💡 Author and Contributors commands The author and contributors of `raycast/browser-bookmarks` can trigger bot actions by commenting: - `@raycastbot close this issue` Closes the issue. - `@raycastbot rename this issue to "Awesome new title"` Renames the issue. - `@raycastbot reopen this issue` Reopen the issue.
thomaslombart commented 5 months ago

It's tricky to get a good fuzzy search experience in the browser's bookmarks. Which results do you get before the one you want with which query?

lovrobuday commented 5 months ago

So the query is: covea live

The results: jira - TimeSheet cover staging cover live ...

the first result does have a word that starts with "co" and a word that starts with "li" one after the other in the url which might be the issue, and cover staging has "cover" in the url where cover live doesnt, but it would be nice to have it fuzzy find over the bookmark name rather than url

thomaslombart commented 5 months ago

Do you open "Jira - TimeSheet" a lot? When searching, the extension will always show first the bookmarks you open more frequently than the others. Otherwise, we do have fuzzy search over the bookmark name with a greater weight than the URL (as you can see here). Improving the fuzzy search functionality is challenging as modifying certain settings may affect other users.

dariodjuric commented 5 months ago

Do you open "Jira - TimeSheet" a lot? When searching, the extension will always show first the bookmarks you open more frequently than the others.

Yes, for me, it appears that this "pollutes" the resultset sometimes. For example, I have the following issue:

CleanShot 2024-02-24 at 16 25 20@2x

When typing "analy" to search for Google Analytics, it always brings up Facebook first, even though this bookmark has no word "analy" anywhere. But it gets into the initial search result because its folder name is remotely similar to the search term, it seems. Facebook has a very high bookmarkFrecency, so it always appears at the top.

dariodjuric commented 5 months ago

Perhaps a better solution would be to not consider all results before sorting by frecency. In my case I have a total of 71 results for that search term, so if one bookmark with a really low score but a really high freqency gets into the result, it will appear at the top, which I don't expect.

Something like this, perhaps?

  const fuse = useMemo(() => {
    return new Fuse(folderBookmarks, {
      keys: [
        { name: "title", weight: 3 },
        { name: "domain", weight: 1 },
        { name: "folder", weight: 0.5 },
      ],
      threshold: 0.4,
      includeScore: true,
      ignoreLocation: true,
      shouldSort: true, // <-- sort the results
    });
  }, [folderBookmarks]);

  const filteredBookmarks = useMemo(() => {
    if (query === "") {
      return folderBookmarks;
    }

    const searchResults = fuse.search(query);

    return searchResults.slice(0, 10) // <-- limit the results before doing further sorting
thomaslombart commented 5 months ago

I get where you're coming from but I got opposite feedback in the past about that. Some people expect results with a high frecency to be sorted before others, even if the first result is less relevant to the query than the rest. So making changes here will definitely impact others.

It seems to me that the issue here is that the query matches the folder name even though it shouldn't. Maybe we can match exact words for folders instead of a fuzzy search?

dariodjuric commented 5 months ago

I get where you're coming from but I got opposite feedback in the past about that. Some people expect results with a high frecency to be sorted before others, even if the first result is less relevant to the query than the rest. So making changes here will definitely impact others.

I see. Do you think it makes sense to make this configurable in the extension's UI so that everyone can tweak as needed?

E.g. have the following option:

Or maybe even these for extra customization:

(I can make a PR if it helps.)

It seems to me that the issue here is that the query matches the folder name even though it shouldn't. Maybe we can match exact words for folders instead of a fuzzy search?

Hm, I'm not sure if that will fix the OP's issue, but for me that would fix this particular case because less bookmarks would appear as relevant.

thomaslombart commented 4 months ago

The tricky bit is that it often depends on whatever you're searching for. Searching by domain may make sense for a certain search while others don't. And configuring the settings takes more time than the search itself.

I'd love to find an algorithm that's good enough for all use-cases because Fuse.js (what we use for Fuzzy search) doesn't really work out here. I'll discuss that with the team to see how we can improve it.

dariodjuric commented 4 months ago

Sounds good! Just to clarify, I was referring to the settings on the extension level (like where the "Show Bookmark Domain" setting is). As different people have different expectations on how the search would behave, they would be able to configure the extension based on those expectations.

Thanks for looking into this! Hoping we'll find a solution that fits everyone.

dariodjuric commented 4 months ago

Here's another example from today where a high-frecency result gets before the result I was searching for:

CleanShot 2024-03-11 at 08 32 18@2x
raycastbot commented 2 months ago

This issue has been automatically marked as stale because it did not have any recent activity.

It will be closed if no further activity occurs in the next 10 days to keep our backlog clean 😊