[Feature] Improve ergonomy of yarn API consumption

bgotink commented 4 years ago

[x] I'd be willing to implement this feature
[ ] This feature can already be implemented through a plugin

TL;DR: Consuming yarn API's is hard and potentially dangerous. This ticket proposes solutions to both problems.

Describe the user story

At the moment it's hard to import yarn packages and use the API. Not only because of a lack of useful documentation (#1654) but also the API design itself.

Here's an example program to fetch information from the user's configured NPM registry regarding a package, with inline comments explaining why the API is unergonomic

// There is no guarantee that the version I'm importing here is compatible with
// the version the user has installed.
import { Configuration, Ident } from '@yarnpkg/core';
import { npath } from '@yarnpkg/fslib';
import { npmHttpUtils } from '@yarnpkg/plugin-npm';

// Wait, I need to import the CLI package in my library?
// The alternative is creating my own plugin configuration, but then I need to
// know internal yarn stuff like "you need to provide yup and clipanion as
// dynamic libraries"
// Note: the CLI contains its own version of all of the built-in plugins as
// dependencies, which are used to parse the config, so in fact things could
// easily break if I e.g. update plugin-npm because some fancy new config was
// added but I forgot to update the CLI package (because why would I?)
import { getPluginConfiguration } from '@yarnpkg/cli';

const config = await Configuration.find(
  npath.toPortablePath(process.cwd()),
  getPluginConfiguration(),
  {
    // This is easily forgotten and it might work fine in my library repo if I'm
    // not using non-builtin plugins, but this would break for users of my code
    // that e.g. have the constraints plugin loaded and have configured a custom
    // constraint filename.
    strict: false,
  }
);

async function getMetadata(ident: Ident, registry?: string) {
  // If you forget to load the correct plugin configuration, this throws awesome
  // errors telling me the yarn code is trying to get unknown configuration keys
  return npmHttpUtils.get(npmHttpUtils.getIdentUrl(ident), {
    configuration,
    registry,
    ident,
    json: true,
  });
}

On top of the comments describing issues above, there's also a lack of validation that my imported version of the yarn packages is compatible with the ones installed on the user's system. In the code above I doubt this would be a problem per se, as I'm only using the yarn API to query things. But, it would be entirely possible to use the yarn API to modify manifests or change resolutions or change the configuration, and then commit these changes to disk, potentially stopping the yarn command from working.

Describe the solution you'd like

I can actually think of multiple solutions here, all of which have upsides and downsides. In the end I would like to have:

a clear and user-friendly way of importing the yarn API, including documentation on the site and in the types;
some form of validation that the yarn API's I'm calling are compatible with the yarn package installed in the user's project.

Option 1: make the yarn binary requirable

This actually makes validating the version compatibility: is the version of the package in the yarn bundle semver compatible with ^<version of the API package I've installed in my library>. Using the API would also guarantee you're not left with a broken system, as long as you're doing valid things with the API. We would for instance never end up with a yarn lockfile with a version number that the yarn bundle in the repo doesn't understand.

It would complicate the yarn bundle quite a bit. We would need to ensure all imports are available via their proper name. In the end this will definitely have a negative impact on the bundle size.

Option 2: create a more ergonomic API package

Side note: I'm going to use the package name @yarnpkg/flurb here, because I'm bad at naming things and I don't want the name to distract from the content.

We could create a package called @yarnpkg/flub that does a couple of things to facilitate fetching the configuration:

It has optional peer dependencies on all of the built-in plugins. It uses those to create the plugin configuration needed for Configuration.find
It contains a function that wraps Configuration.find with a default strict: false and the plugin configuration mentioned in the previous bullet.
This wrapped configuration should throw errors with slightly different messages when e.g. an unknown configuration is requested.
It could even default the path for Configuration.find to __dirname

So the code sample above would be reduced to

import type { Ident } from '@yarnpkg/core';
import { findConfiguration } from '@yarnpkg/flurb';
import { npmHttpUtils } from '@yarnpkg/plugin-npm';

const configuration = await findConfiguration();

async function getMetadata(ident: Ident, registry?: string) {
  return npmHttpUtils.get(npmHttpUtils.getIdentUrl(ident), {
    configuration,
    registry,
    ident,
    json: true,
  });
}

which already looks a lot better.

This leaves the issue of the possible incompatibilities though. We could solve these in a number of ways, but the only one I can think of that catches all possible incompatibilities is the following one: we create a new resolver. Let's call that one "flurb" as well, because I am not in an imaginative mood. My package would have the following manifest:

{
  "dependencies": {
    "@yarnpkg/flurb": "flurb:^2.1.1",
    "@yarnpkg/core": "flurb:^2.1.0",
    "@yarnpkg/plugin-npm": "flurb:^2.1"
  }
}

The yarn binary would resolve these versions as follows:

It takes the version of the package that's found in the yarn bundle itself
If the version found is not compatible with the range in the descriptor, throw an error
Resolve the version number as npm:<the exact version number>

The biggest issue with this approach would be that it's not possible to install my package using any other package manager. Maybe this could be solved using a field in dependenciesMeta instead of a new range protocol? Or maybe we could magically resolve these builtin packages to the builtin version number in the npm resolvers?

An alternative to this entire approach would be to make the yarn bundle return a map of package names to version numbers when required, instead of executing. We could then validate compatibility between the version in my library and the version in the bundle, but unless the version is an exact match there's always a risk of incompatibilities.

arcanis commented 4 years ago

I think what you're looking for would be a kind of utility library built on top of the Yarn libraries. While I think it would be quite valuable, I'm not certain it would be a good fit for the core project, as I don't see direct use case for us as consumers. I'd prefer to see that live in a separate repository and see how it evolves over time.

bgotink commented 4 years ago

This could definitely be built in a separate repository. There are only two changes—as I see it atm—that really must happen in the yarn repository:

make @yarnpkg/cli/package.json available to plugins (and pin the version numbers at build time so the manifest doesn't list a dependency on @yarnpkg/core with range workspace:^2.1.1)
move BaseCommand and WorkspaceRequiredError from @yarnpkg/cli to another package. These two classes are the reason a lot of the plugins have peerDependencies on the CLI package, and removing that would make installing @yarnpkg/cli to consume a plugin's node API no longer necessary.

arcanis commented 4 years ago

make @yarnpkg/cli/package.json available to plugins (and pin the version numbers at build time so the manifest doesn't list a dependency on @yarnpkg/core with range workspace:^2.1.1)

I'm not sure I understand why it would be a problem, especially since the core is a peer dependency of the CLI, so you always have control on the version you use (and if you use the wrong one, you'll get a peer dependency warning).

move BaseCommand and WorkspaceRequiredError from @yarnpkg/cli to another package. These two classes are the reason a lot of the plugins have peerDependencies on the CLI package, and removing that would make installing @yarnpkg/cli to consume a plugin's node API no longer necessary.

Is it really a problem though? 🤔 The rest of the CLI is fairly small, I wonder if that's really useful to split the builtin CLI configuration from the CLI itself.

bgotink commented 4 years ago

I'm not sure I understand why it would be a problem, especially since the core is a peer dependency of the CLI, so you always have control on the version you use (and if you use the wrong one, you'll get a peer dependency warning).

If I'm using the packages of yarn 2.5 but the user has yarn 2.1 installed on their system, making changes to the dependencies could have adverse effects. Examples:

the .pnp.js file will probably look ever so slightly different because of changes to support package exports, or due to changes in fslib
the lockfile version could have changed
the cache key could have changed

At the very least these cause unwanted changes on disk. At worst these break builds (e.g. if yarn install --check-cache is part of the project's CI/CD pipeline)

Is it really a problem though? 🤔 The rest of the CLI is fairly small, I wonder if that's really useful to split the builtin CLI configuration from the CLI itself.

The CLI does contain dependencies on all plugins, which can easily lead to trouble. For example:

I'm using @yarn/cli version 2.1.x, while in @yarn/cli version 2.2 we introduce support for fallback npm registries (there's a ticket open for it somewhere). I update the @yarnpkg/plugin-npm package to 2.2. Everything looks okay at install time. Everything works fine at build time. But, if I use the npmHttpUtils I get an error saying that npmFallbackRegistries is an unknown configuration key. Reason: the @yarn/cli contains its own version of the @yarnpkg/plugin-npm that's still at 2.1.x, and that version is used to parse the user's configuration.

yarnpkg / berry