yarnpkg / berry

πŸ“¦πŸˆ Active development trunk for Yarn βš’
https://yarnpkg.com
BSD 2-Clause "Simplified" License
7.44k stars 1.11k forks source link

[Case Study] GitHub Registry #156

Open arcanis opened 5 years ago

arcanis commented 5 years ago

What package is covered by this investigations?

The GitHub Package Registry

Describe the goal of the investigation

To figure out what should our actions be going forward. Find out how to provide a safe and sound user experience that protects against name squatting.

Should we move to the GitHub Package Registry as default registry?

I've seen this question here and there, so we probably should discuss it.

My opinion is: I don't think we need to change the default registry anytime soon, unless something changes dramatically on the npm side. There are three reasons why I think we should wait:

First-class support

Something we need to consider is: should the GitHub registry be one registry amongst many (in the sense that it would piggy-back on the npm: protocol), or have a first-class support (with a specific protocol, like npm+gh:)?

The first case will likely cause developer experience issues (how to depend on a GitHub package from an npm package?), the second doesn't scale very well if we need to do that for all the registries.

My perception is that we need to follow intent. For all purposes, our users will likely choose to depend on a package from one of the two sets of registry: npm or GitHub. Other registries will, I believe, merely be either 1/ mirrors of the first two, or 2/ private npm instances with specific workflows (which will be able to safely enforce the registry configuration for a given scope, for example).

In this light I'd be in favor of npm+gh: being a supported protocol (rather than just configuring the registry hostname in the settings). It wouldn't so much define the target hostname, but rather the set of packages we're expected to download.

Use a specific package from the GitHub registry instead of npm

This would become possible with the resolutions field:

{
  "resolutions": {
    "foo": "npm+gh:^1.2.3"
  }
}

Possible action points (please discuss)

Paging @yarnpkg/berry, @bnb, @zkochan, @clarkbw for feedback (anyone else from @GitHub interested?)

orta commented 5 years ago

For some extra context, how we handled this with CocoaPods was to extend the equivalent of the package.json with a new attribute for source:

{
  "sources": [
     "gh",
      "npm"
   ],
  "dependencies": {
     "danger/danger-js": "^1.2.3", // From GH
     "metro": "^3.2.1". // From NPM
   }
}

Where the order is important. An non-existent copy of "sources" is just ["npm"], making it opt in for different behavior.

CocoaPod has different constraints (all lib definitions are fs access, and so lookups are cheap) but maybe it could spark an idea.

zkochan commented 5 years ago

One of the things I currently like about node.js is that when I read the code, I check the import section const foo = require('@zkochan/foo') and I know that this package is @zkochan/foo on npm. I think this is especially valuable during code reviews. I guess, there is no way to preserve this?

Regarding the protocol. There is already a github: protocol but only for git-hosted packages. For instance, github:kevva/is-positive.

arcanis commented 5 years ago

I think this is especially valuable during code reviews. I guess, there is no way to preserve this?

It's already not always the case - it can come from npm, but also be a workspace, or a git dependency, or come from the file: protocol, or be a peer dependency (in which case all bets are off).

Regarding the protocol. There is already a github: protocol but only for git-hosted packages. For instance, github:kevva/is-positive.

Yep that's why I mentioned npm+gh: (npm protocol via github). Repurposing github: is a possibility I guess, but it would be breaking unless we try to support both Git and the GitHub registry with the same protocol. Maybe too much work when an extra protocol can do the job πŸ€”

For some extra context, how we handled this with CocoaPods was to extend the equivalent of the package.json with a new attribute for source:

Interesting - I think the problem with this approach is that the packages can't be resolved anymore unless you know what's the source field. For example in the case of the resolutions field we wouldn't have the source parameter (or we could reuse the one from the top-level package, but that might be fairly confusing).

It also might cause troubles with third-party tools (for example npm) that wouldn't be aware of the source field and would resolve from npm instead of GitHub - which could lead to new attack vectors.

arcanis commented 5 years ago

Oh something I just remembered: we're working on "Zero-Installs" for Yarn v2 (more infos here). I remember seeing someone mentioning that "master will always be available as a package". If that's literally what happens (ALL of the repository is downloaded through the "master" package) it might be a bit problematic for us since the repository might then contain all the zip archives for the dependencies.

I suspect the @github folks have thought about this (maybe not for zip archives, but at least for other typically-useless-package-files like the tests or the documentation), so I'm curious to hear from @clarkbw or anyone else if there's a mechanism to filter the master package list. Maybe an npmignore or files field support?

clarkbw commented 5 years ago

Lots to talk through here. And I want to bring in @phanatic to the conversation as well.

I’m encouraging people to look at how they can publish in parallel at this moment, it’s too early in our lifetime for any complete switch over. I want your feedback on what we have now and what an ideal future would be like.

The other piece I think is important to consider is that we intend to open source the server components. We are doing this because we want you to be able to balance client and server complexity. Most frameworks have simple servers and complex clients to do the heavy lifting. An open source server that we share with you means we can build a better solution that doesn’t mean working around a limited static sever component. The proxy of packages is a good example, I want you to assume we build a Yarn server together such that you could default to yarn packages and namespaces but have the server proxy npm as needed and likely notify the client which ones were proxy packages.

@orta this ☝️ goes for you all (CP) as well, please reach out.

The registry has a base layer object and meta data storage with GraphQL APIs which the sever components use store file objects. Server components handle the URL endpoints and client API translation to that GraphQL layer. A future yarn server could be a lightweight wrapper around the GraphQL APIs.

MarshallOfSound commented 5 years ago

@arcanis Thanks for raising this. This was one of the first things I thought of when the GitHub announcement happened and I wrote up my thoughts / ideas here: https://gist.github.com/MarshallOfSound/7101ff77c5f981e01362985935790633

I'll summarise them below, though I'd recommend reading through my ramblings in full πŸ˜„ :

arcanis commented 5 years ago

The proxy of packages is a good example, I want you to assume we build a Yarn server together such that you could default to yarn packages and namespaces but have the server proxy npm as needed and likely notify the client which ones were proxy packages.

So in summary you would see the GitHub registry as a platform on top of which multiple other registries could potentially be built? So in a sense, the GitHub registry would be one "universe" amongst others (with a "universe" being a set of published packages)? We could make that through a smart protocol:

"@yarnpkg/cli": "universe(gh): ^1.2.3",

The package manager would then look in its configuration to figure out what's the configured registry for the gh universe, and use that (or abort the install if the gh universe isn't configured). It would work with various use cases:

This proposal also has some issues, but they have their own solutions:

For security reasons unrecognised registries defined at a module level should be treated as untrusted unless trusted by the user. (Think how ssh's known_hosts file works).

In the universe concept I mentioned the hosts would be defined in a yarnrc file, so they would never come from the third-party packages themselves.

Whatever solution is reached here should mutually be defined and implemented in npm as well.

It's hard to predict what npm will do (their whole cli team resigned, and the last commits were early March - it's not even clear who from their company should be brought into the discussion).

Given the current state of things, I'm kinda assuming it will be left in a limbo state until proven otherwise; we probably should plan for that. Ping @ahmadnassri who might have some new insight to share πŸ™‚

markspolakovs commented 5 years ago

Warning: total n00b speaking here, with only user-level experience

Dreaming slightly, if we weren't bound by backwards compatibility or other companies in any way, a potential solution would be to make all package names URLs that identify their registry/source/universe, similar to what Go does. For example, packages on the npm registry would be npmjs.com/foo-bar and packages on GitHub's registry might be github.com/markspolakovs/baz-lib/baz, while an internal registry might be npm.corp.mycompany.com/quux. Forces users to be explicit about what registry they're using, as well as making it unambiguous. Also does away with the need for a quasi-known_hosts.

For reference, the way Go transforms the "pretty URL" into a package resolution is via a HTML meta tag - for example github.com/yarn/berry contains <meta name="go-import" content="github.com/yarnpkg/berry git https://github.com/yarnpkg/berry.git">. So npmjs.com/foo-bar would have a similar pointer to the package.json.

Obviously, this is a massive backward compat break, and, while 99% of packages are still on npm, the UX isn't pretty. Perhaps a bare foo would be aliased to npmjs.com/foo?

jgierer12 commented 5 years ago

GitHub now proxies to npm registry if the package doesn't exist in GPR: https://github.blog/2019-09-11-proxying-packages-with-github-package-registry-and-other-updates/

bnb commented 5 years ago

@jgierer12 this is currently limited to your organization's npm packages, not all npm packages.

NotMoni commented 4 years ago

FYI: NPM is joining Github so yea πŸ¦„

NotMoni commented 4 years ago

Github packages might be a possibility