microsoft / vsmarketplace

Customer feedback and issue tracker repository for Visual Studio Marketplace
MIT License
39 stars 10 forks source link

Marketplace search improvements #154

Closed isidorn closed 1 year ago

isidorn commented 3 years ago

Hi VS Code PM here đź‘‹

I understand that there are already search issues opened in this repository, however I wanted to have one super issue which has all the items linked. Improving search is the number one thing we would want for the Marketplace team to improve đź’Ż

If needed we can provide many more examples where search can be improved, but here's a list of issues to start. These examples are mostly from the Extensions view in VS Code which uses Marketplace search API.

jeff-hykin commented 3 years ago

As one of the last people who commented on one of the main downstream issues 2 years ago: I'm really glad to see this being opened! đź‘Ť

christianshay commented 2 years ago

There are special characters (for example "/") that are not needed and not documented that are messing up the tags.

This tag link is bringing up any extension that includes "PL" or "SQL" instead of those tagged "PL/SQL": https://marketplace.visualstudio.com/search?term=tag%3APL%2FSQL&target=VSCode&category=All%20categories&sortBy=Relevance

isidorn commented 2 years ago

One more example:

image

bpasero commented 2 years ago

Just to clarify on the above example: even if I put a 100% matching search for the name of the extension, the extension I search for is not on the top:

image

prashantvc commented 2 years ago

We at VS Marketplace started the work to improve the search relevancy. As part of this effort, we need your help to curate the search results to define baseline KPI

How can you help?

  1. Visit https://ms-extensions.azurewebsites.net/
  2. Search for an extension (any of your favourite extensions)
  3. Observe the search results, do they make sense?
  4. Rearrange results, you can drag and drop, any item need to be on the top? Ranked lower
  5. Fill in the feedback, rational behind the rearranging the results
  6. Hit submit and repeat for different search terms
jeff-hykin commented 2 years ago

There's lots of examples from previous posts that are still unfixed but sure I'll oblige and redo the search/screenshots

Example 1

Prolog Syntax

Screen Shot 2022-05-16 at 3 29 28 PM

Want to see & Justification

(The "Prolog" extension being at the top is fine)

  1. My C++ syntax extension shouldn't be the 2nd option
  2. A theme called "syntax" shouldn't be the 3rd option
  3. My "Better Prolog Syntax" extension should be above them considering it has:
    • a title with 100% of the search query
    • keywords that match 100% of the search query
    • been updated more recently
    • and isn't being drug-down by missing or low ratings (5/5 stars)

Me wanting the the "Prolog Language" extension higher needs more justification though:

Justification/Expected Feature: Uncommon Word Importance

If someone searches for "Code Entschuldigung" showing the most-popular extension with "Code" in the name is like showing an the most-popular extension with the word "The" in the name, because "Code" is absurdly common. In contrast, the other term "Entschuldigung" is extremely uncommon. If there is an extension with an extremely uncommon word that matches, then its almost certain thats what the user is looking for. This applies to "Prolog Syntax" search, "Prolog" is uncommon "Syntax" is very common.

The equation for this is simple, its just Bayes Rule:

I mean come on this is search-101 methodology

Example 2

Prolog Language

Screen Shot 2022-05-16 at 3 45 22 PM

Want to see

The one with arrow above Angular

Justification/Expected Feature: Negative Relevance

Example 3

code eol

Screen Shot 2022-05-16 at 3 33 41 PM

(Keywords of the "Code-eol (Line Endings)" extension)

Screen Shot 2022-05-16 at 3 34 50 PM

Want to see

I'm going to assume the reader is intelligent and gets the idea.

As a side note I would expect word2vec similarity metrics, but I think VS Code Marketplace needs to start with the basics before I can request that.

prashantvc commented 2 years ago

@jeff-hykin Thanks a lot for taking time to write up a detailed answer. We are doing data driven, and iterative process to improve the search relevancy. Yes, we are starting with basics; you can expect better tokanisation and word2vec will eventually make into the marketpalce

lramos15 commented 2 years ago

Here's a gif of searching for Azure Repos. I need to type the full name before it is even in view. Recording 2022-06-03 at 12 09 58

isidorn commented 2 years ago

@lramos15 thanks for sharing this. @SaiKanth007 another great example of how scoring for multiple words is not good now.

anangaur commented 2 years ago

IMO, this could be ab issue of treating multiple forms of a word as different keywords. May be using a stemmer would help here? For example, send, sending, sent should all be stemmed as a common keyword send. If we apply this during tokenization of extension keywords as search query keywords, the above problem would be solved.

anangaur commented 2 years ago

@isidorn , this is a great list. @prashantvc , can we categorize the list into categories?

Category Issue Comments
Bug Extensions filter for publisher should be exact vscode#151192
Bug Default to AND rather than OR in marketplace search vscode#51207
Feature Advanced extension search (for example excluding certain words) #20
Feature Advanced extension search (for example excluding certain category) #20
Relevance
Tokenization

. . .

isidorn commented 2 years ago

Another example: search for Latex. The 2nd result should be first (much better quality, rating, install count, freshness) Screenshot 2022-06-12 at 13 58 16

prashantvc commented 2 years ago

Another example: search for Latex. The 2nd result should be first (much better quality, rating, install count, freshness)

Shouldn't the exact match be on the top? 2nd result has better metrics, but it is not too far away.

isidorn commented 2 years ago

@prashantvc by typing "latex" user wants to use latex, we should offer the best extension for this. The same reason is when you type "vscode extension latex" in google you get the LaTeX Workshop, and not the other one.

Same example for "rust"

jeff-hykin commented 2 years ago

The 2nd result should be first (much better quality, rating)

I would strongly disagree. I dont think anybody is having a hard time finding popular extensions like Latex workshop, rather the opposite; it's very hard to find excellent-but-unpopular extensions. If I wanted to know the popular Latex tools I'd go look for a Medium article called "top ten Latex tools for VS Code", if I wanted to find an extension with exactly the name "Latex" I'd search for Latex in the marketplace.

Both of the top two have 5 stars and decent quality, but the first actually matches what the user searched for. I'd rather it return what I searched instead of what's popular and kinda related to the search.

isidorn commented 2 years ago

Here’s an example that a vscode team member ran into. Search for "vscode selfhost test". The extension “VS Code Selfhost Test Provider” appears around position 50 I would guess (need to scroll down a bit), and before it there are 10 or more “test” extensions that I guess people used to test that they can publish :shrug:. Tweaking the query to "vscode selfhost test provider" still doesn’t show it in a top spot

image

isidorn commented 2 years ago

@jeff-hykin I think you are right in general, but not in this case. Just because some early extension was lucky to get the name that fits 100% like "latex" that does not mean that extension is a better result. When users search for "latex" they want to work on latex and they want to know what is the extension for this task. Same example for "rust". Just my 2 cents. It is not like search ranking has to behave this way.

So my point is - exact match should be rewarded more for longer results. Thus it should depend on search query length. For short queries like 5, 6 characters exact match might not matter so much. So it should ideally be a linear increase per query length of how the "exactness" of the match is weighted.

Tyriar commented 2 years ago

I adding "Image Editor" despite really not wanted to because search wasn't smart enough to figure out Luna Paint is the only full blown image editor (see https://github.com/microsoft/vsmarketplace/issues/71), it contains from "edit images" in the package.json description and more keywords in the readme content.

image

Similar bad examples, editor config and edit csv are high on both even though it doesn't seem to contain the word image?

image image

jeff-hykin commented 2 years ago

For short queries like 5, 6 characters exact match might not matter so much. So it should ideally be a linear increase per query length of how the \"exactness\" of the match is weighted.

I think changing the weight makes sense, but I think using number of characters would be problematic and easy to fix. Going back to Bayes, the word "Latex" would be super common across extensions, and I would agree with you that the exact match should be deprioritized with that in mind.

But if I search for "Mario" (very rare but short) and "visual studio code" (very common but long) I would expect Mario to prioritize an exact match while "visual studio code" could prioritize popularity. ("Image editor" is another good example of long but generic/common)

vadimcn commented 2 years ago

This whole thing looks like a classical document retrieval problem, for which TF-IDF (basically what @jeff-hykin had described above) is the go-to approach. This should cover all basics, and you can add rank boosting based on popularity on top of that.

I would also recommend talking to someone in Bing, they've been solving similar problems for a couple of decades by now.

isidorn commented 2 years ago

Good points. Thanks to both!

RandomFractals commented 2 years ago

We at VS Marketplace started the work to improve the search relevancy. As part of this effort, we need your help to curate the search results to define baseline KPI

How can you help?

  1. Visit https://extensions.ms/
  2. Search for an extension (any of your favourite extensions)
  3. Observe the search results, do they make sense?
  4. Rearrange results, you can drag and drop, any item need to be on the top? Ranked lower
  5. Fill in the feedback, rational behind the rearranging the results
  6. Hit submit and repeat for different search terms

@prashantvc and @isidorn: how is this new extensions reranking page being used in improving extension search results by the marketplace team?

I tried and moved the Pivotal Concourse CI Pipeline Editor down the recommended extensions list when searching for image editor because it has nothing to do with image editing and provides no such capabilites.

However, as an extension developer, I have some questions and concerns about this under the radar tool:

  1. Do you use it as part of some ML pipeline to improve search results in marketplace?
  2. Should not you have some user authentication and validation when users of that page rearrange extensions to improve your search results?
  3. Should not you add keywords, categories, and a link to the extension hover card to display full extension info summary and link to the extension in marketplace?
  4. What guards do you have in place for that page to prevent someone scripting and upvoting their extensions? I've seen that happen with Svg Preview before as an example.
codedChaos commented 2 years ago

Just commenting to remind myself to research and elaborate later but has anyone mentioned, or created an issue around the inefficacy of using multiple sort filters in a search?

In my years of VS Code extension searches, I typically have tried to improve results by using two sorting metrics. My biggest example is usually something like this:

thing-i-want @sort:rating @sort:ratingscount 

! I found the ratingscount attribute in the source code. It isn't listed in the GUI search options but I use it occasionally, not sure that it actually works but...ymmv

or

thing-i-want @sort:rating @sort:downloads

I do this because simply sorting by rating returns plenty of 5-star extensions with only one to five actual reviews, which means diddly-squat to me. I want ratings over a 4 with at least a couple dozen reviewers/raters.

I suspect the sort function is not stable when using multiple sort filters, but I'll actually need to do application testing to prove this.

Has anyone else noticed this or is it just me?

isidorn commented 2 years ago

@codedChaos this is great feedback. fyi @SaiKanth007 @kj0171

tamuratak commented 2 years ago

Hi, I am one of LaTeX Workshop's maintainers.

When searching a word tex, the extension's rank is ridiculously low. The word tex is set as a keyword in package.json. Users tend to use tex as the same term as latex although they are different.

We can say the same thing to ms-python.python. When searching django, the extension's rank is low although django is set as a keyword in package.json.

Screen Shot 2022-08-17 at 10 43 21-fullpage

isidorn commented 2 years ago

@tamuratak great feedback! fyi @SaiKanth007 @kj0171

isidorn commented 2 years ago

Another bad search and result

The first result should be the "Live Share" extension which is not even a result here.

Screenshot 2022-09-01 at 21 42 01
benibenj commented 2 years ago

Here a comparison between the old and new Search results. I entered the exact same name of an extension. In the old version it shows up at second place which is fine (I think it should be first on exact match). In the new version it's placed a lot further down!

Old results:

New results:

prashantvc commented 2 years ago

@benibenj thanks for a lot for the report! We will investigate it. But generally, how do you like the new search service?

benibenj commented 2 years ago

It seems that smaller extensions (less downloads) are harder to find if they have multiple words in their name (Python C++ Debugger for example).

hediet commented 2 years ago

GitLens should be first when searching for Git Lens:

image

Version Lens should be below, as it does not even mention git.

kj0171 commented 2 years ago

We have done some bug fixes and improvements on the marketplace search service. The changes are live in vscode insider. Do try them out and share feedback :)

https://user-images.githubusercontent.com/25655940/196911597-ec51bfce-00a0-4771-b8f2-d2fdd040300c.mp4

hediet commented 2 years ago

Nice work on the update!

However, I find this problematic:

image

The GitLens extension does not show up here.

isidorn commented 1 year ago

@hediet thanks for the feedback, we are looking into this exact case. The issue is that we are doing prefix matching instead of fuzzy matching.

lramos15 commented 1 year ago

Maybe an edge case but when searching by publisher i.e. Matt Bierner the top result doesn't make sense and the third result is a bit of a weaker match I'd say.

image
isidorn commented 1 year ago

@lramos15 good catch, thanks for reporting this. @SaiKanth007 @kj0171 let's double check this one once we do the improvements we agreed on.

pcjmfranken commented 1 year ago

Simple filtering options for statistical properties such as last updated date, download count, verification status, etc. Preferably multiple of such filters could be applied simultaneously.

This would, for example, allow me to display only extensions updated within the last 30 days, with a download count of at least 2500, and only by verified publishers.

This data is already available to the marketplace search results page (took a quick peek at the devtools network panel), so why not use it?

kj0171 commented 1 year ago

Simple filtering options for statistical properties such as last updated date, download count, verification status, etc. Preferably multiple of such filters could be applied simultaneously.

This would, for example, allow me to display only extensions updated within the last 30 days, with a download count of at least 2500, and only by verified publishers.

This data is already available to the marketplace search results page (took a quick peek at the devtools network panel), so why not use it?

Thanks for the feedback. We are aware of this. Will try to address this in near future.

kj0171 commented 1 year ago

We have rolled out search enhancements (Details here: https://github.com/microsoft/vsmarketplace/issues/154#issuecomment-1285224069).

Please share feedback and reopen if necessary.

isidorn commented 1 year ago

Thanks @kj0171

To clarify, most of the improvements can be seen in VS Code Insiders. While we plan to have these improvements in VS Code Stable soon. Try them out in VS Code Insiders and let us know what you think.

alefragnani commented 1 year ago

Just noticed something on Insiders release that could lead to an impersonating issue discussed here https://vscode-dev-community.slack.com/archives/C74CB59NE/p1673358662096609, based on a post in https://blog.aquasec.com/can-you-trust-your-vscode-extensions

The marketplace search does not respect exact match if you use the publisher field.

I search for an extension, clicked in the publisher name to see other extensions of that publisher (myself), but the new search also returns extensions from authors with similar names.

image

Based on this comment it seems only well-known publishers are handled, but I would argue that this change should be revisited.

Also, if you misstype the search, using publisher:"microsoft instead (yes, just missing the final double quotes), the result is not good at all. This error, on the other hand, happens on both, Stable and Insiders releases.

image

Thank you

kj0171 commented 1 year ago

Just noticed something on Insiders release that could lead to an impersonating issue discussed here https://vscode-dev-community.slack.com/archives/C74CB59NE/p1673358662096609, based on a post in https://blog.aquasec.com/can-you-trust-your-vscode-extensions

The marketplace search does not respect exact match if you use the publisher field.

I search for an extension, clicked in the publisher name to see other extensions of that publisher (myself), but the new search also returns extensions from authors with similar names.

image

Based on this comment it seems only well-known publishers are handled, but I would argue that this change should be revisited.

Also, if you misstype the search, using publisher:"microsoft instead (yes, just missing the final double quotes), the result is not good at all. This error, on the other hand, happens on both, Stable and Insiders releases.

image

Thank you

Thank you for pointing out the issue. For a few reasons, we aren't able to take this fix forward. We are aware of this issue and are working towards this. It will soon be fixed. We will keep you updated.

prashantvc commented 1 year ago

We deployed the search improvements. Note: This is an ongoing effort, and more improvements will follow. We appriciate if you have any feedback for the team

You will start seeing more relevant results, and support for following features:

Light Theme - Previous Search Dark Theme - New and Improved Search

Inclusion and Exclusion (+, -) [#20] image

Multi-word search image

AND/OR You can now use AND/OR operation with multiword searches

Screenshot 2023-01-11 at 15 39 04

NOT Exclude unwanted search terms Screenshot 2023-01-11 at 15 39 39

prashantvc commented 1 year ago

The marketplace search does not respect exact match if you use the publisher field.

@alefragnani This is quite an unique problem, thanks a lot for reporting. There are two opinions within the team/VS Code users. We are still figuring out the intent behind the query. Did user mean to apply a filter? Did they try search extensions by publisher name!? It may take awhile conclude this.

The missing double quote can be handled better, it's on our list to fix it.

prashantvc commented 1 year ago

Hey All,

We have deployed (7th Feb) number of changes to the VS Code Insiders improving relevancy, especially for multi-word searches and overrall search experience. The improvements will make their way into VS Code Stable in couple of weeks

Please give it try and let us know what you think. We will continue to work on improving search in VS Code as well as in the Marketplace; your continued support and feedback will help us make it better for the community.

Thank you all for participating in the discusstion. Please feel free to leave comments or contact me directly, we can chat about ideas and possible improvents (Booking Link)

I am closing this issue, and continue discussion in the open issues listed in the description

isidorn commented 1 year ago

I created this follow up issue to make sure @alefragnani publisher bug is still captured https://github.com/microsoft/vsmarketplace/issues/580

@SaiKanth007 mentioned to me that this should be fixed end of March.

gerroon commented 1 day ago

This is not working for me. Is this implemented?

@enabled NOT @category:"themes"