haskell / hackage-server

Hackage-Server: A Haskell Package Repository
http://hackage.haskell.org
Other
416 stars 198 forks source link

Scoring of packages using pagerank, rather than user ratings #818

Open chessai opened 5 years ago

chessai commented 5 years ago

The 3-star rating of packages has historically been not very useful. Even the most well-known of packages receive even a small number of ratings from users. For example:

Package Number of Votes
base 24
text 13
bytestring 11
vector 4

Implementing a rating system based on pagerank could help in the following ways:

  1. Ratings would not be an (effectively) redundant piece of information on a package's page
  2. Users could see a numerical number roughly reflective of the community's trust of the package
  3. PageRank is a better fit for rating over time, since user ratings are very unlikely to be retracted/changed (or even made at all) after a changed opinion (perhaps due to an improvement in the library).

@taktoa and I discussed this and we both think this is a better approach over rule of succession/bayesian averaging/anything that relies on explicit user voting. @taktoa please comment if you have anything to add.

taktoa commented 5 years ago

If it's too slow to recompute the PageRanks every time someone uploads a package, you could implement Fast Incremental and Personalized PageRank instead.

hvr commented 5 years ago

Users could see a numerical number roughly reflective of the community's trust of the package

While I agree that the 3-star rating isn't a sufficient metric (and curiously there's been cases of politically motivated downvoting on Hackage already, but that's just something you have to live with), I think that claiming PageRank to be a metric of trust is a very misleading premise. While you didn't specifiy exactly how you'd apply PageRank to the Cabal metadata, PageRank is merely a metric of popularity, but certainly not of "trust". It only applies to dependencies maintainers voluntarily depend on, but not those that are forced upon you due either lack of alternatives or due to other choices you made which vendor lock you into that choice. Then there's also effects of cargo-culting where people are just not aware of the alternatives, and a PageRank metric might even reinforce this vicious cycle by making people less confident about walking lesser travelled roads. And fwiw, I can think of a couple of packages which I certainly wouldn't classify as trustworthy and yet they appear in a majority of install-plans across Hackage.

That being said, I welcome adding a PageRank-like metric as an additional number to look at or that you can sort by, but I don't consider it a replacement for the manual user rating metric.

gbaz commented 5 years ago

Pagerank in this case is being described as a sort of fancy-weighted way of summing over transitive reverse dependency counts, right? So the first step would be for somebody to jump in and help finish the long-delayed reverse-dependency code that now exists at https://github.com/haskell/hackage-server/pull/723

This code is entirely feature-complete, but appears to still consume excessive space in-memory when at full hackagedb scale. With that in place, it would be straightforward in code (but perhaps interesting mathematically) to augment the revdep information further with incremental pagerank data.

But that said, given the structure of dep-graphs in Haskell, I'd be curious if pagerank actually provided a value-add over revdep counts themselves. However, such a question is best answered empirically, by actually implementing things and seeing what happens :-)

gbaz commented 2 years ago

cf: https://github.com/haskell/hackage-server/issues/986