purescript / psc-package

A package manager for PureScript based on package sets
https://psc-package.readthedocs.io
Other
228 stars 45 forks source link

Package registry #24

Open hdgarrood opened 7 years ago

hdgarrood commented 7 years ago

Following on from https://github.com/purescript/purescript/issues/2526. I am thinking about the architecture of psc-package, and in particular, thinking about how it differs from Stackage in that Stackage is an extra layer in front of Hackage, and Hackage is a centralized package registry which provides:

I also think that having a centralised registry which is separate from curated package sets provides an important option for publishing packages for authors who might struggle to find time to keep their packages up to date; if the only option is submitting to a package set, I think we risk discouraging people from publishing their packages at all.

Another related issue that has just occurred to me: I think it's quite far from ideal that if someone were publishing their packages only through psc-package and also uploading them to Pursuit, the information about dependencies and bounds which would be passed to purs publish via --manifest psc-package.json on Pursuit would essentially be meaningless. Since the package author would not actually be using it in the course of developing their package, I expect in most cases it would quickly go out of date.

It is probably obvious by now that I would quite like to have a centralised package registry of some kind. However, I appreciate that this would amount to quite a lot of work. So I'm really opening this issue to ask: do you agree that it is worth addressing these issues by creating a centralised registry and modifying psc-package to use it, and if not, is that because of how much work it would be or because of something else?

paf31 commented 7 years ago

I think of package-sets as sort of like a registry, albeit quite difficult to use. So let me try to summarize what I think of as the differences, and we can see if we can agree:

So I'm in favor of tracking this data. Some of it seems to belong in the package set, and some outside.

Unfortunately, I don't have time to work on it right now.

Let me ask though: given we already decided that psc-package was not going to be the blessed package management solution, are you trying to solve a particular problem with psc-package here, or just a general problem of tracking data for general purpose use?

hdgarrood commented 7 years ago

I guess I'm mainly trying to understand the ideas behind psc-package better, so that I can help work out how Pulp should use it, and I'm also trying to imagine what the purescript ecosystem would look like without Bower and trying to work out what I might be able to do to help there.

In addition to that, though, I do consider the things I've written to be problems for psc-package which I would maybe like to fix or investigate through separate projects.

Re git: if we can find some space on some server, wouldn't it be simpler to just have a tarball per package per version? Mirroring git repos won't be fun if the history has been rewritten since the last time you pulled from the upstream repo and a fast-forward merge is impossible. We also can't just delete and reclone the whole repo, as tags could have been mutated or removed.

To clarify my position re ease of use; I agree that it would be nice to make it easier to publish packages but that's not really what I mean here. My worry is that by only having package sets there will be a social effect that leads people to avoid creating and publishing packages at all. (Yes, of course there is bower, but I am starting to think that bower's lack of a proper solver means that ideally it wants replacing.)

Pauan commented 7 years ago

@hdgarrood I get what you're saying, but as the user of libraries, I really like that package sets give a kind of guarantee of maintenance.

There's been plenty of packages on Pursuit which I wanted to use, but I couldn't because they hadn't been updated in months, because the author didn't maintain them. But they were still listed in Pursuit, leading me to believe that they would work (even though they don't).

I would rather have a small set of high-quality maintained libraries, rather than a large set of low-quality unmaintained libraries (which is what you get with other package managers like npm).

This does run the risk that a library which I am using will be dropped in a future package set, but if anything I consider that a good thing, because it means that people will bug the author to update the library (or fork the library).

Providing workarounds for unmaintained libraries just encourages more unmaintained libraries. Having a strict "must maintain" policy encourages maintenance. Yes it puts more pressure on library authors, but since there are far more library users than library authors, I think that's okay.

In my opinion, the library user experience is more important than the library author experience, because libraries are useless by themself (they are only useful when used by another library or application). So there is a natural asymmetry which is biased toward library users.

So I think the only thing that is necessary is to have the ability for somebody to "take over" an unmaintained package. So that way if the author doesn't maintain the package, somebody else can.

hdgarrood commented 7 years ago

As someone who is both a library user and a library author, I very much disagree. If there was an expectation of maintenance, as there is with package sets, most of my libraries would never have been published. Even though they do often lag behind, I know that people find them useful.

I'm very strongly against any policy that puts unnecessary pressure on library authors or encourages library authors to bug library users. That's not at all the tone I want the PureScript library ecosystem to have.

We know from experience that the Stackage model - i.e. a package registry with no maintenance expectations, plus package sets with maintenance expectations for authors who want to commit to it, works well and scales. I also don't buy the argument that it will be hard to find packages that work well. It's easy - just don't search the full registry, instead restrict your search to a package set. I know a tool that can search within a package set doesn't exist now but it easily could do.

Pauan commented 7 years ago

@hdgarrood Just to be clear, when I say "bug the library author" I mean "file a bug report about updating the library", or "make a pull request updating the library", or "send a polite e-mail explaining the situation", that sort of thing. I don't mean waking them up at 2 AM to pester them.

hdgarrood commented 7 years ago

Ok good, thanks for clarifying. My position is unchanged, though.

Pauan commented 7 years ago

What about using the npm registry for packages which don't want to commit to the package set? With peerDependencies you can ensure that there is only a single version of each package.

And we will need to use npm anyways, because there are some PureScript packages which use JavaScript packages. So that avoids needing to create a separate registry.

hdgarrood commented 7 years ago

That does seem like a good option, but I would rather not get into this discussion right now - for now, I am really just hoping to ascertain how psc-package should interact with a centralised registry, if at all, rather than details like which registry we use, whether we make our own, etc.

paf31 commented 7 years ago

I don't have time for a full reply now, but I wanted to just clear up a couple of things:

I think we can implement a centralized package repository, but it fundamentally would be different from psc-package.

hdgarrood commented 7 years ago

Ok great, thanks. I can't remember if I've said this elsewhere but I expect having version bounds would help with curation once package sets start to become a bit larger, incidentally, having seen how Stackage operates.

When/if you have time I'd be interested to hear about your views on Git as the base for psc-package too, in particular with respect to the issue of package availability, as I'm not sure I fully understand where you're coming from there.

mostalive commented 7 years ago

(possibly dumb question) @paf31 imagine all the purescript package you need to do something are in package-sets, what would you use bower for? (since js dependencies probably come from npm).

@hdgarrood I think Git can be great for availability, because it is inherently distributed. I can imagine e.g. specifying multiple locations for a package in package.json. It also looks like it is very easy to host a 'private repository' - clone the packages repository, add your private packages where merge conflicts are unlikely (e.g. at the bottom). I think in the long run it could grow the ecosystem - if I can easily split my application into a bunch of libraries in git repositories, and ensure they all build together, making the more generic ones available to everyone else is a matter of moving the git repository to github/bitbucket/gitlab and sending a pull request on 'package-sets' once it is far enough along to be interesting to someone else.

Maintaining community infrastructure is an expensive and not always thankful experience, from what I see in various communities. Less infrastructure = more time available to develop libraries, the compiler and other more interesting parts of the ecosystem.

mostalive commented 7 years ago

FYI (sorry if this is the wrong place, could not find a more appropriate one at the moment) to make it a bit easier to add existing packages to the package set, I wrote a small shell script

https://gist.github.com/mostalive/54dbbf388f6ca58795d6ae37fef22890

that generates most of a packages.json snippet for you. (it currently produces one comma to much and needs to be formatted manually on adding the snippet to packages.json).

I'd be happy to translate this to haskell as a subcommand of psc-package, but not sure it belongs there, and if so, under what name.

The other thing I found useful is a oneliner to extract dependencies from 'bower.json' for use in 'psc-package.json':

$ jq '.dependencies | keys' bower.json | sed s/purescript-//g
[
  "lists",
  "mmorph",
  "monoid",
  "prelude",
  "tailrec",
  "transformers",
  "tuples"
]
hdgarrood commented 7 years ago

@hdgarrood I think Git can be great for availability, because it is inherently distributed.

I know that Git is inherently distributed, but that doesn't actually address the availability issues I have described earlier in this thread and in purescript/purescript#2526. I am still not aware of any good way of handling a case where a package author rewrites the Git history between releases if we continue to use Git as the base.

mostalive commented 7 years ago

I see (I did read that thread, there's a lot to take in). Oversimplifying things, I was thinking of a trade off. which has a greater chance:

  1. some one rewrites their git history, on purpose or by accident.
  2. a centralized piece of community infrastructure is down (this I've seen happen in more than just the Haskell community )

The chance of 1. increases with the size of the package set. At the same time so does 2. Why would anyone do 1? I can think of left-pad (https://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/ ) as an example, but that is independent of the technical solution - if someone wants to remove their contribution for whatever reason, they should be able to (someone else can take it over / fork it / ... ) I think.

What I then understand from your question at 2526 is: how can we limit the amount of work that has to be done by others when a contributed package gets removed or is broken? Or how can we prevent it?

In the case of the broken tag: How about storing the tag and the commit hash for that tag and seeing if they match? It is probably possible to change a commit hash, but you'd have to make an effort to do it... (temporary resolution would be to fork the repository, set the tag as it was and send an (automated) message to the author to fix their package).

hdgarrood commented 7 years ago

This is not a trade off we are necessarily forced to make; if packages are distributed as .tar.gz files via a system like IPFS or BitTorrent, it's conceivable that everything apart from perhaps publishing new versions of packages would still work.

if someone wants to remove their contribution for whatever reason, they should be able to (someone else can take it over / fork it / ... ) I think.

Certainly people should be able to say "I no longer have the inclination or time to maintain this" and we should have a process for allowing someone else to take over. But once version x.y.z of package A is published on a package registry it should be available indefinitely (except for in very, very unusual cases e.g. the contents of the package are likely to cause legal issues). If, as a package author, you're not comfortable with that, then don't publish your package.

What I then understand from your question at 2526 is: how can we limit the amount of work that has to be done by others when a contributed package gets removed or is broken?

No, my question is how can we prevent this from happening in the first place, because it is entirely avoidable.

How about storing the tag and the commit hash for that tag and seeing if they match?

This has already been suggested and people have already described why it won't work. In summary: we need to be able to reliably obtain package A at version x.y.z. If a package manager just fails with a checksum mismatch error during installation (and all a package manager could reasonably do in that scenario is to just fail with a checksum mismatch error), that's essentially useless from the point of view of the developer trying to install their project's dependencies.

mostalive commented 7 years ago

Thank you for the detailed reply @hdgarrood . I'm mulling it over.