ipfs-inactive / package-managers

[ARCHIVED] 📦 IPFS Package Managers Task Force
MIT License
99 stars 11 forks source link

GitHub Package Registry #55

Open andrew opened 5 years ago

andrew commented 5 years ago

On Friday GitHub announced their new Package Registry which has launched in a limited beta for some users.

I don't have access to publish packages myself yet but had a good trawl through the documentation to see how it stacks up.

They've launched with support for npm, rubygems, maven, nuget and docker, using a subdomain for each endpoint which is then further name spaced by the owner of the repository that a package is published under, for example rubygems.pkg.github.com/andrew

That means there isn't one big index of packages published on github, instead there are many, many small ones.

There's no easy way to find a list of all packages that have been published so far, you can use the search though: https://github.com/search?q=npm&type=RegistryPackages

There's also no proxying to existing registries like Artifactory does, most of the npm packages on the github registry still reference the regular npmjs.org namespace, https://github.com/whitesource-yossi/npm-plugin3/blob/master/package.json#L45-L59, so it's not meant to be a total replacement for existing public registries.

Key takeaways

No hard reproducibility link between git repo and published packages

Although the GitHub registry page makes it look like there is a strong link between the published packages and a particular git commit on the same repository, there actually isn't any guarantee that the same code is in the package as visible in the git repository. Tarballs generated by package manager clients are simply attached to tagged releases of the same number.

For example, https://github.com/providenceinnovation/simple-web-worker/releases/tag/1.2.3

Screenshot 2019-05-13 at 12 29 29

In this particular case, it looks like the tag was automatically created on master but the person publishing it was working from a different branch locally: https://github.com/providenceinnovation/simple-web-worker/blob/provfork/package.json

Downloading both the package tarball and the source code for the same tag and comparing them confirms it:

Screenshot 2019-05-13 at 12 30 59

This means the same style of attack as happened to event-stream, where the code published to the github repository was different to the code published to npm, meaning people and tools frequently review different code to what they downloaded.

Doesn't appear immutable

The documentation for each package manager outlines how to delete either a single version or every version of a package, so anyone can delete packages that they are in control of. It's not 100 clear but I would suspect that deleting a repository from github also deletes all it's packages, and deleting a user or org account would also delete all packages.

This leaves the door open for more left-pad scenarios.

It's also not clear if after deleting a package, another one can be published using the same number, this is a similar effect to a "force push", replacing code whilst retaining the version name, although many clients store an md5/sha256 integrity hash in lockfiles which would catch mischief like that.

Multiple registry ambiguity

In the case that many people started publishing oss packages to GitHub, skipping community registries altogether (ignoring the previous concerns), package manager clients may need to up their game in how they handle resolving dependencies across multiple registries at once.

For instance, with rubygems, consider a Gemfile like this:

source "https://rubygems.pkg.github.com/andrew"
source "https://rubygems.pkg.github.com/warpfork"
source "https://rubygems.pkg.github.com/olizilla"
source "https://rubygems.org"

gem "rails"

If all three github users have created a repository called rails on their own github account, (or forked one, like https://github.com/andrew/rails) and then published their own package to the GitHub package registry, how do you know which one of the four available versions of rails will be picked?

This similar problem happens in the maven world, and as I discussed with @danielcompton in https://manifest.fm/12, has serious security implications.

How does this affect IPFS's integration efforts with package managers?

At face value, it doesn't have much of an impact, it's not going to replace community registries in it's current form. The most likely use cases are for private packages and short lived forks.

If anything the current issues around the lack of immutability and reproducibility further highlight that there is more work to be done in package management as a whole, of which IPFS is a key technology.

Also because they mirror existing registry APIs and rely on open source clients, mirroring open source packages onto IPFS should still be possible with existing tooling, but their registry is closed source and will be difficult to get visibility on future changes, so should not attempt to tie ourselves to closely to it.

There could also be an interesting opportunity to encourage GitHub to mirror these packages onto IPFS itself, which may help drive adoption of IPFS in client usage as well.

momack2 commented 5 years ago

Thanks for the thoughtful writeup, @andrew!

What benefits/values might Github get from mirroring packages onto IPFS? Might there be ways they could use IPFS to mitigate some of the reproducibility/immutability issues described above?

andrew commented 5 years ago

@momack2 yeah I think reproducibility/immutability would be main benefit they would get, they already make a big deal of having a global CDN delivering the packages so I suspect performance won't be a big seller.

danielcompton commented 5 years ago

@andrew, this is pretty spot on, I had all the same concerns as you did. Depending on adoption, it may spur the Maven and Rubygems ecosystem to adopt repository scoping like NPM has.

The benefits and trade offs seem to make more sense for private packages within a company than public, open-source packages, at least at the moment. I guess we’ll see what the community does though. Overall my impressions are pretty good on execution, and they seem to have been talking with other package repos too.

clarkbw commented 5 years ago

I need to double check the bundler behaviour but GitHub sources are required to have a namespace.

For example:

source "https://rubygems.pkg.github.com/github"
source "https://rubygems.org"

gem "rails"
gem "@github/rube"

The @github org is limited to publishing Gems that use the @github/ namespace so it can't provide a global rails gem, at most it could provide a @github/railsand that's the bundler behaviour I need to check on.

Certainly the number of sources could start to add up and we might look at ways to reduce that.

andrew commented 5 years ago

@clarkbw maybe I can get access to the beta so I can see the API endpoint responses 😉

andrew commented 5 years ago

I've now got access to the beta of the GitHub Package Registry so will be having a play around with it soon

andrew commented 5 years ago

Looks like you can't delete packages from the registry anymore: https://help.github.com/en/articles/about-github-package-registry#deleting-a-package

To avoid breaking projects that may depend on your packages, GitHub Package Registry does not support package deletion or deleting a version of a package. Under special circumstances, such as for legal reasons or to conform with GDPR standards, you can request deleting a package through GitHub Support. Contact GitHub Support using our contact form and the subject line "GitHub Package Registry."

Although you can still delete the repository, worth investigating if that removes the packages for that repo: https://twitter.com/PhaniRajuyn/status/1139897674935754752

Zimmi48 commented 4 years ago

Although you can still delete the repository, worth investigating if that removes the packages for that repo: https://twitter.com/PhaniRajuyn/status/1139897674935754752

Very early in GitHub's life, you couldn't delete a repository if it had forks because internally a repository and its forks was stored as a single repository: https://github.blog/2008-10-14-repo-deletion-for-everyone/

The strategy they chose to make it possible to delete any repository was to keep the repository internally and delete only the access endpoints. I wouldn't be surprised if they adopted the same strategy for packages.