npm / www

community space for the npm website
https://npm.community
69 stars 48 forks source link

Package search results ordering is misleading / confusing #254

Open davidjb opened 7 years ago

davidjb commented 7 years ago

What does npm use to rank packages based on "popularity"? The reason I ask is that a search for the example term 'ldap' (eg https://www.npmjs.com/search?q=ldap&page=1&ranking=popularity) shows a number of packages ahead of what I'd consider the most 'popular' packages (eg highest number of downloads / GitHub stars / forks etc). Overall, the results seem haphazard:

screenshot-2017-10-16 ldap - npm search (I've annotated the download stats to the right of each entry in blue)

At a guess, I'd say npm is ranking the exact package names (LDAP and ldap) higher than all other results and originally I thought the other packages ranked by downloads descending, but it doesn't quite apply. Either way this is confusing as someone is left with results that aren't useful -- and one is left guessing what 'popularity' means (especially when one needs to click into each package to go looking for the download stats etc).

Solutions to the issue could be to improve the algorithm, state the metric(s) being used and/or display those metrics (eg download count) on the search results page. If the exact package names are being matched first, then highlighting/explaining that would help a user make sense of the results.

jmarca commented 6 years ago

Similar issue searching for jwt implementations. Searching for "jwt", the top result is package/jwt, most recently published 6 years ago with download stats of 11/162/648. In contrast, the second result package/jsonwebtoken was last published 4 weeks ago and has download stats of 38K/657K/2.8M. Now I realize searching for "jwt" will pop up project "jwt" because someone is squatting on a name, but I would expect sorting by anything other than word match would shove that dead project (6 years!!) way down in the results.

davidjb commented 6 years ago

Changing the title of this issue because it's not just "popularity" ordering that's affected. Popularity is still definitely an issue as per my comments above, but in general, the exact name match will always come first no matter what Search Criteria one uses (Best Overall, Quality, Popularity, Maintenance).

n the case of ldap or jwt, the same packages always come first no matter what criterion is being used. This leads to misleading results - one would think that jwt is more popular / well maintained than jsonwebtoken and you'd be wrong - and the only way to discover this is by manually comparing each package's page.

So, in general, this seems to be just a case of not favouring the exact string match in any of the existing given search criteria. From my point of view, I requested a sort on Popularity, so that's what I'm expecting to see, not just whichever package that happened to get the name first. I'd suggest a new criterion of Package Name which provides the current functionality should anyone want to search purely on name.

Displaying package metrics on the search results page as well would make the result set clearer as well, because at present, it's guesswork and requires manually comparing individual package pages.

jmarca commented 6 years ago

Perhaps an easy fix would be to show the score to the right. So for example, popularity as sort would show number of downloads or github stars or whatever mystery value is used for sorting criteria. Then when there is an exact match by name, that shows first, but secondary sort criteria is also shown. So in my case, sorting on maintenance would show...? number of releases?? That way I wouldn't have to click through on each and every likely candidate to see whether the project is dead and abandoned.

screenshot_20171108_151313

aearly commented 6 years ago

We have a new search page in the works that addresses most of the concerns. We do give exact matches for the search term a large boost in the rankings, but we plan to sprinkle in other metrics in the results that will make it really obvious when the #2 result is likely what you want.

davidjb commented 6 years ago

@aearly Thanks for the info. The crux of the issue is that it's just unclear what the results mean and as @jmarca has highlighted, you currently need to dig into every result in order to draw a conclusion on a given package.

As an example, https://npms.io/search?q=jwt is better as it shows when a package was updated, whether it's considered stable etc. The scoring is a bit opaque until you have a good read through the About documentation, but overall much improved. Still feel that explaining/highlighting an exact string match is needed, otherwise a casual observer may not realise why that is on top (especially when names are closely related).

gaearon commented 6 years ago

Not sure if the search page is already new (I assume it is), but it’s quite confusing react-dom is at the end even though it’s very popular and an exact match for my query.

screen shot 2018-04-05 at 6 26 06 pm