haskell / hackage-server

Hackage-Server: A Haskell Package Repository
http://hackage.haskell.org
Other
414 stars 198 forks source link

RFC: Adapt wording & tooling to discourage uploading experimental "DO NOT USE" packages into the main index #461

Open hvr opened 8 years ago

hvr commented 8 years ago

Just today I noticed https://hackage.haskell.org/package/wsdl-0.1.0.0 which comes with a big "DO NOT USE, UNSTABLE AND INCOMPLETE." disclaimer in its description.

IMO, such packages don't belong into the main 00-index.tar, as they're clearly not meant for public consumption yet. And such uploads add to the self-fulfilling prophecy (c.f. broken window theory) that Hackage has no quality standards and anything goes.

I'm not sure what the motivation/assumption for uploading a package to Hackage is, but I've noticed in the past that uploaders often didn't know about the Hackage candidate feature, and just wanted to try out the workflow.

In any case, as soon as a package becomes part of the 00-index.tar, it becomes a package that causes overhead for several entities (including us Hackage Trustees ;-) ). It gets picked up by search engines, is considered by Hackage's own package search, gets picked up by matrix.h.h.o eventually, etc.

Also, experiments that end up in a dead end effectively use up precious names from the package namespaces seem troublesome to me (sure, the package names could be reclaimed in theory, but it's very confusing if a package changes its scope/purpose completely depending on the version -- so this should rather be the exception). A name like wsdl is certainly one of the premium names which deserve to be handled with more responsibility, as such a principal name suggest to be the blessed "go to packages" for a given task.

So a package added to 00-index.tar should ideally satisfy a few baseline requirements, IMO.

More specifically, a package uploaded as non-candidate ought to come with a bit more responsibilty to improve the overall quality of Hackage packages (and keep the Trustee-workload manageable). So, for non-candidate uploads I suggest something along the lines of:

A more drastic way would be to require approval when new package names are being created (i.e. you'd still be able to upload candidates for new packages names w/o approval, but publishing a new package name to the main index for the first time would require such an approval). We'd need to make sure that the approval process takes at most 24h or so, by having a large enough group of people being able to approve a new package name.

/cc @bergmark @dcoutts @gbaz


Related, there's also the issue of trivial packages using up short package names, but failing the equivalent of the Fairbairn-threshold for packages:

Other premium names taken (although maybe with a less clear verdict whether they fall below the threshold):

A different class of questionable packages are "personal" packages which appear to have an audience of one, the author himself:

(TODO: add more examples)

dcoutts commented 8 years ago

Note that the preference mechanism can be used for this kind of beta release.

ezyang commented 8 years ago

Fine by me, as long as there's clear instructions how to upload experimental packages.

hvr commented 8 years ago

Here's another experimental package with a short dictionary word, jump, clearly marked as placeholder ("synopsis: Nothing to see here, move along") polluting the Hackage index.

This shows that even experienced users tend to misunderstand Hackage as being their personal testing ground and uploading dummy versions of packages not passing the basic threshold of being even intended to be used by others.

phadej commented 8 years ago

I agree with everything including

A more drastic way would be to require approval when new package names are being created

but have no good ideas how to make approval process fair and responsive

phadej commented 8 years ago

Btw, how pypi, cpan, pear, npmjs and other repositories handle this?

ezyang commented 8 years ago

jump is a bad example. The initial upload is clearly a name squat, since if you look at the GitHub repository they clearly intend to release an alternative base under this name. You can see in the repository that they are developing the package in good faith and I think it's OK for them to do this.

gbaz commented 8 years ago

what should our policy be on name squatting in general? like how to distinguish between a "good faith" one and a not good faith one?

ezyang commented 8 years ago

BTW, other package managers do very poorly with this. http://incolumitas.com/2016/06/08/typosquatting-package-managers/ https://phpsec.xyz/composer-typosquatting-vulnerability-877d263509ec#.xuw039sz6

gbaz commented 8 years ago

CPAN doesn't have strict policies, it appears. But it has some nice author guidelines we may want to rip off:

http://www.cpan.org/modules/04pause.html

gbaz commented 8 years ago

npm has some stricter actual policies: https://www.npmjs.com/policies/disputes

(see also: https://www.npmjs.com/policies/conduct)

hvr commented 8 years ago

@ezyang it doesn't matter whether it's done in good faith or not. It doesn't change the fact that such dummy releases don't help anybody, and therefore shouldn't needlessly bloat the published Hackage index tarball. If it's about reserving a name, there's a different mechanism to do that. If you publish a package to the public package index, it's supposed to be useful to people other than the package author, which jump-0.0.0.0 clearly is not.


UPDATE: jump was officially deprecated a few months after the "name squat"; its README on GitHub now states

This project has been deprecated in favor of two new projects: ...

  • haskell-lang (live website) is the new destination for Haskell documentation
  • Foundation is a more active and innovative standard library rethinking in Haskell

Both are active and welcoming community projects, please get involved!

So at this point jump is just a dead corpse which never had any useful release, and yet everyone has to download, store it, and process it during index traversals, as it is forever enshrined in the package index.

gbaz commented 6 years ago

I think we resolved the immediate issue here with the statement on the upload page that "your package should strive to provide value for the community by being intended to be useful to others." Other aspects of this are being discussed with the uncurated ecosystem proposal. I suggest closing this particular ticket as superseded?

phadej commented 6 years ago

@gbaz let's close this only after proposal is accepted.

FWIW, proposal doesn't address name squatting issue.

I think we want to have some restrictions in uncurated index still. At least have some (lighter than current package overtake) procedure to reclaim package name if new maintainer wants their new package to be curated, and old one hadn't any curated versions.

Yet, I don't think we should complicate your proposal with such detail at the moment, so I'd leave this issue open for now.


Even now, while we have

"your package should strive to provide value for the community by being intended to be useful to others.

we have no "what would happen if you don't" clause.