Closed isidorn closed 1 year ago
As one of the last people who commented on one of the main downstream issues 2 years ago: I'm really glad to see this being opened! đź‘Ť
There are special characters (for example "/") that are not needed and not documented that are messing up the tags.
This tag link is bringing up any extension that includes "PL" or "SQL" instead of those tagged "PL/SQL": https://marketplace.visualstudio.com/search?term=tag%3APL%2FSQL&target=VSCode&category=All%20categories&sortBy=Relevance
One more example:
Just to clarify on the above example: even if I put a 100% matching search for the name of the extension, the extension I search for is not on the top:
We at VS Marketplace started the work to improve the search relevancy. As part of this effort, we need your help to curate the search results to define baseline KPI
There's lots of examples from previous posts that are still unfixed but sure I'll oblige and redo the search/screenshots
Prolog Syntax
(The "Prolog" extension being at the top is fine)
Me wanting the the "Prolog Language" extension higher needs more justification though:
If someone searches for "Code Entschuldigung" showing the most-popular extension with "Code" in the name is like showing an the most-popular extension with the word "The" in the name, because "Code" is absurdly common. In contrast, the other term "Entschuldigung" is extremely uncommon. If there is an extension with an extremely uncommon word that matches, then its almost certain thats what the user is looking for. This applies to "Prolog Syntax" search, "Prolog" is uncommon "Syntax" is very common.
The equation for this is simple, its just Bayes Rule:
I mean come on this is search-101 methodology
Prolog Language
The one with arrow above Angular
code eol
(Keywords of the "Code-eol (Line Endings)" extension)
I'm going to assume the reader is intelligent and gets the idea.
As a side note I would expect word2vec similarity metrics, but I think VS Code Marketplace needs to start with the basics before I can request that.
@jeff-hykin Thanks a lot for taking time to write up a detailed answer. We are doing data driven, and iterative process to improve the search relevancy. Yes, we are starting with basics; you can expect better tokanisation and word2vec will eventually make into the marketpalce
Here's a gif of searching for Azure Repos
. I need to type the full name before it is even in view.
@lramos15 thanks for sharing this. @SaiKanth007 another great example of how scoring for multiple words is not good now.
IMO, this could be ab issue of treating multiple forms of a word as different keywords. May be using a stemmer would help here?
For example,
send
, sending
, sent
should all be stemmed as a common keyword send
. If we apply this during tokenization of extension keywords as search query keywords, the above problem would be solved.
@isidorn , this is a great list. @prashantvc , can we categorize the list into categories?
Category | Issue | Comments |
---|---|---|
Bug | Extensions filter for publisher should be exact vscode#151192 | |
Bug | Default to AND rather than OR in marketplace search vscode#51207 | |
Feature | Advanced extension search (for example excluding certain words) #20 | |
Feature | Advanced extension search (for example excluding certain category) #20 | |
Relevance | ||
Tokenization |
. . .
Another example: search for Latex. The 2nd result should be first (much better quality, rating, install count, freshness)
Another example: search for Latex. The 2nd result should be first (much better quality, rating, install count, freshness)
Shouldn't the exact match be on the top? 2nd result has better metrics, but it is not too far away.
@prashantvc by typing "latex" user wants to use latex, we should offer the best extension for this. The same reason is when you type "vscode extension latex" in google you get the LaTeX Workshop, and not the other one.
Same example for "rust"
The 2nd result should be first (much better quality, rating)
I would strongly disagree. I dont think anybody is having a hard time finding popular extensions like Latex workshop, rather the opposite; it's very hard to find excellent-but-unpopular extensions. If I wanted to know the popular Latex tools I'd go look for a Medium article called "top ten Latex tools for VS Code", if I wanted to find an extension with exactly the name "Latex" I'd search for Latex in the marketplace.
Both of the top two have 5 stars and decent quality, but the first actually matches what the user searched for. I'd rather it return what I searched instead of what's popular and kinda related to the search.
Here’s an example that a vscode team member ran into. Search for "vscode selfhost test". The extension “VS Code Selfhost Test Provider” appears around position 50 I would guess (need to scroll down a bit), and before it there are 10 or more “test” extensions that I guess people used to test that they can publish :shrug:. Tweaking the query to "vscode selfhost test provider" still doesn’t show it in a top spot
@jeff-hykin I think you are right in general, but not in this case. Just because some early extension was lucky to get the name that fits 100% like "latex" that does not mean that extension is a better result. When users search for "latex" they want to work on latex and they want to know what is the extension for this task. Same example for "rust". Just my 2 cents. It is not like search ranking has to behave this way.
So my point is - exact match should be rewarded more for longer results. Thus it should depend on search query length. For short queries like 5, 6 characters exact match might not matter so much. So it should ideally be a linear increase per query length of how the "exactness" of the match is weighted.
I adding "Image Editor" despite really not wanted to because search wasn't smart enough to figure out Luna Paint is the only full blown image editor (see https://github.com/microsoft/vsmarketplace/issues/71), it contains from "edit images" in the package.json description and more keywords in the readme content.
Similar bad examples, editor config and edit csv are high on both even though it doesn't seem to contain the word image?
For short queries like 5, 6 characters exact match might not matter so much. So it should ideally be a linear increase per query length of how the \"exactness\" of the match is weighted.
I think changing the weight makes sense, but I think using number of characters would be problematic and easy to fix. Going back to Bayes, the word "Latex" would be super common across extensions, and I would agree with you that the exact match should be deprioritized with that in mind.
But if I search for "Mario" (very rare but short) and "visual studio code" (very common but long) I would expect Mario to prioritize an exact match while "visual studio code" could prioritize popularity. ("Image editor" is another good example of long but generic/common)
This whole thing looks like a classical document retrieval problem, for which TF-IDF (basically what @jeff-hykin had described above) is the go-to approach. This should cover all basics, and you can add rank boosting based on popularity on top of that.
I would also recommend talking to someone in Bing, they've been solving similar problems for a couple of decades by now.
Good points. Thanks to both!
We at VS Marketplace started the work to improve the search relevancy. As part of this effort, we need your help to curate the search results to define baseline KPI
How can you help?
- Visit https://extensions.ms/
- Search for an extension (any of your favourite extensions)
- Observe the search results, do they make sense?
- Rearrange results, you can drag and drop, any item need to be on the top? Ranked lower
- Fill in the feedback, rational behind the rearranging the results
- Hit submit and repeat for different search terms
@prashantvc and @isidorn: how is this new extensions reranking page being used in improving extension search results by the marketplace team?
I tried and moved the Pivotal Concourse CI Pipeline Editor down the recommended extensions list when searching for image editor
because it has nothing to do with image editing and provides no such capabilites.
However, as an extension developer, I have some questions and concerns about this under the radar tool:
Just commenting to remind myself to research and elaborate later but has anyone mentioned, or created an issue around the inefficacy of using multiple sort filters in a search?
In my years of VS Code
extension searches, I typically have tried to improve results by using two sorting metrics. My biggest example is usually something like this:
thing-i-want @sort:rating @sort:ratingscount
! I found the
ratingscount
attribute in the source code. It isn't listed in the GUI search options but I use it occasionally, not sure that it actually works but...ymmv
or
thing-i-want @sort:rating @sort:downloads
I do this because simply sorting by rating returns plenty of 5-star extensions with only one to five actual reviews, which means diddly-squat to me. I want ratings over a 4 with at least a couple dozen reviewers/raters.
I suspect the sort function is not stable when using multiple sort filters, but I'll actually need to do application testing to prove this.
Has anyone else noticed this or is it just me?
@codedChaos this is great feedback. fyi @SaiKanth007 @kj0171
Hi, I am one of LaTeX Workshop's maintainers.
When searching a word tex
, the extension's rank is ridiculously low. The word tex
is set as a keyword in package.json
. Users tend to use tex
as the same term as latex
although they are different.
We can say the same thing to ms-python.python
. When searching django
, the extension's rank is low although django
is set as a keyword in package.json
.
@tamuratak great feedback! fyi @SaiKanth007 @kj0171
Another bad search and result
The first result should be the "Live Share" extension which is not even a result here.
Here a comparison between the old and new Search results. I entered the exact same name of an extension. In the old version it shows up at second place which is fine (I think it should be first on exact match). In the new version it's placed a lot further down!
Old results:
New results:
@benibenj thanks for a lot for the report! We will investigate it. But generally, how do you like the new search service?
It seems that smaller extensions (less downloads) are harder to find if they have multiple words in their name (Python C++ Debugger for example).
GitLens
should be first when searching for Git Lens
:
Version Lens
should be below, as it does not even mention git.
We have done some bug fixes and improvements on the marketplace search service. The changes are live in vscode insider. Do try them out and share feedback :)
Nice work on the update!
However, I find this problematic:
The GitLens
extension does not show up here.
@hediet thanks for the feedback, we are looking into this exact case. The issue is that we are doing prefix matching instead of fuzzy matching.
Maybe an edge case but when searching by publisher i.e. Matt Bierner
the top result doesn't make sense and the third result is a bit of a weaker match I'd say.
@lramos15 good catch, thanks for reporting this. @SaiKanth007 @kj0171 let's double check this one once we do the improvements we agreed on.
Simple filtering options for statistical properties such as last updated date, download count, verification status, etc. Preferably multiple of such filters could be applied simultaneously.
This would, for example, allow me to display only extensions updated within the last 30 days, with a download count of at least 2500, and only by verified publishers.
This data is already available to the marketplace search results page (took a quick peek at the devtools network panel), so why not use it?
Simple filtering options for statistical properties such as last updated date, download count, verification status, etc. Preferably multiple of such filters could be applied simultaneously.
This would, for example, allow me to display only extensions updated within the last 30 days, with a download count of at least 2500, and only by verified publishers.
This data is already available to the marketplace search results page (took a quick peek at the devtools network panel), so why not use it?
Thanks for the feedback. We are aware of this. Will try to address this in near future.
We have rolled out search enhancements (Details here: https://github.com/microsoft/vsmarketplace/issues/154#issuecomment-1285224069).
Please share feedback and reopen if necessary.
Thanks @kj0171
To clarify, most of the improvements can be seen in VS Code Insiders. While we plan to have these improvements in VS Code Stable soon. Try them out in VS Code Insiders and let us know what you think.
Just noticed something on Insiders release that could lead to an impersonating issue discussed here https://vscode-dev-community.slack.com/archives/C74CB59NE/p1673358662096609, based on a post in https://blog.aquasec.com/can-you-trust-your-vscode-extensions
The marketplace search does not respect exact match if you use the publisher
field.
I search for an extension, clicked in the publisher name to see other extensions of that publisher (myself), but the new search also returns extensions from authors with similar names.
Based on this comment it seems only well-known publishers are handled, but I would argue that this change should be revisited.
Also, if you misstype the search, using publisher:"microsoft
instead (yes, just missing the final double quotes), the result is not good at all. This error, on the other hand, happens on both, Stable and Insiders releases.
Thank you
Just noticed something on Insiders release that could lead to an impersonating issue discussed here https://vscode-dev-community.slack.com/archives/C74CB59NE/p1673358662096609, based on a post in https://blog.aquasec.com/can-you-trust-your-vscode-extensions
The marketplace search does not respect exact match if you use the
publisher
field.I search for an extension, clicked in the publisher name to see other extensions of that publisher (myself), but the new search also returns extensions from authors with similar names.
Based on this comment it seems only well-known publishers are handled, but I would argue that this change should be revisited.
Also, if you misstype the search, using
publisher:"microsoft
instead (yes, just missing the final double quotes), the result is not good at all. This error, on the other hand, happens on both, Stable and Insiders releases.Thank you
Thank you for pointing out the issue. For a few reasons, we aren't able to take this fix forward. We are aware of this issue and are working towards this. It will soon be fixed. We will keep you updated.
We deployed the search improvements. Note: This is an ongoing effort, and more improvements will follow. We appriciate if you have any feedback for the team
You will start seeing more relevant results, and support for following features:
Light Theme - Previous Search Dark Theme - New and Improved Search
Inclusion and Exclusion (+, -) [#20]
Multi-word search
AND/OR You can now use AND/OR operation with multiword searches
NOT Exclude unwanted search terms
The marketplace search does not respect exact match if you use the publisher field.
@alefragnani This is quite an unique problem, thanks a lot for reporting. There are two opinions within the team/VS Code users. We are still figuring out the intent behind the query. Did user mean to apply a filter? Did they try search extensions by publisher name!? It may take awhile conclude this.
The missing double quote can be handled better, it's on our list to fix it.
Hey All,
We have deployed (7th Feb) number of changes to the VS Code Insiders improving relevancy, especially for multi-word searches and overrall search experience. The improvements will make their way into VS Code Stable in couple of weeks
Please give it try and let us know what you think. We will continue to work on improving search in VS Code as well as in the Marketplace; your continued support and feedback will help us make it better for the community.
Thank you all for participating in the discusstion. Please feel free to leave comments or contact me directly, we can chat about ideas and possible improvents (Booking Link)
I am closing this issue, and continue discussion in the open issues listed in the description
I created this follow up issue to make sure @alefragnani publisher bug is still captured https://github.com/microsoft/vsmarketplace/issues/580
@SaiKanth007 mentioned to me that this should be fixed end of March.
This is not working for me. Is this implemented?
@enabled NOT @category:"themes"
Hi VS Code PM here đź‘‹
I understand that there are already search issues opened in this repository, however I wanted to have one super issue which has all the items linked. Improving search is the number one thing we would want for the Marketplace team to improve đź’Ż
If needed we can provide many more examples where search can be improved, but here's a list of issues to start. These examples are mostly from the Extensions view in VS Code which uses Marketplace search API.