RFC: Toolchain - Githubissues

dherman commented 5 years ago

This RFC describes a design for the Notion's central unifying design concept: the toolchain.

This RFC supersedes #21 and #22.

charlespierce commented 5 years ago

It's natural to question whether pinning for reproducibility is worth the cost of extra fetching.

I think that pinning the node version of an installed binary to the version in that project's platform key actually does less for reproducibility that it would seem. Specifically, for tools that are used in a project, we are already following the depender's settings for determining the node version. That is, even if typescript specifies "node": "11.6.0" in its package.json, if we are using typescript in a project and have node pinned to 10.5.0 in the project, then we'll be using node@10.5.0 to invoke tsc, even though that's different from what was specified.

So pinning the user version to the version that was used to develop a tool only applies for user tools that aren't explicitly specified in a given project. Which (should) mean that they have no impact on the reproducibility of builds for that specific project, because if they did have an effect, then you would want to add that dependency to the project and we would be back on the above use case.

Furthermore, there are a couple of similar concepts that are being conflated by using the platform specifier for both roles: The developer experience (for people developing a tool) and the user experience (for people using the tool in their own work). The current approach would use the same node version for both use cases, as there's only one platform entry, but it's not guaranteed that that is the intent of the tool authors. In general, I would imagine that CLI type tools that are expected to be globally installed are tested against multiple versions of node, and so should be generally able to work in multiple environments.

Lastly, pinning a user tool to the platform specified in that tool's package.json removes some of the ability for end users to control the platform that they are running on their own machine. There is no way for a notion user to say "I want this tool to use these platform versions." As mentioned above, the user version of a tool only matters for tools that aren't specified in a project, so these are likely tools that the user has installed for their own general-purpose use. If we, instead of using the tool's platform specifier, instead use the user's current platform to pin the tool, then the user can easily control which version of node is used to execute a tool.

Alternatively, we could even not pin the platform for a user tool at all, and simply use the current user platform whenever the tool is executed. That would open potential issues with compatibility, but I would have to guess that those are relatively rare, since that more closely matches the current status quo. It would also make the Notion user experience easier to understand: If you are running in a project with platform specified, then we use that project's platform, if you are running outside of a project then we use your user platform. No caveats about "If you are running a tool, then we will use these other platforms to run that tool specifically, but other tools might be using different platforms so will use different versions." We already have reproducibility for projects covered with the platform specifier, combined with npm or yarn lock files to pin the versions of dependencies, I suspect that doing a lot of pinning for user tools outside of projects might be premature optimization, maybe we should start with the simplest and easiest to explain approach, and re-visit if there are significant problems.

charlespierce commented 5 years ago

After typing that all out, I think my proposal would be simplification to: If you are in a project, we use that project's platform for all invocations. If the tool that is being invoked is installed as a dependency in that project, we use the locally installed version, otherwise we use the user-specified version that was installed when the user did notion install. Lastly, if you are outside of a project, we always use the current user toolchain and the tool version that was installed with notion install.

To help with any weird corner cases that might come up, I also thought of having a notion run command that would let the user specify a toolchain at execution time, i.e. notion run --node 11.6 tsc, which would execute tsc in a platform that used node@^11.6.0, regardless of the project or user platforms.

mikrostew commented 5 years ago

@charlespierce I'm replying to a couple things you wrote above:

Lastly, pinning a user tool to the platform specified in that tool's package.json removes some of the ability for end users to control the platform that they are running on their own machine. There is no way for a notion user to say "I want this tool to use these platform versions."

Pinning the user tool to a specific platform should have no affect on the platform the user is running on their machine. And vice-versa: if the user changes their active platform, it should not break the user tool. The idea is that the user tool is installed like a standalone binary, which is statically linked to a specific platform, apart from whatever platform they have installed.

So in a typical scenario, the user runs notion install ember-cli for example. So ember-cli is installed as a user tool, pinned to whatever is specified in "platform" in package.json, or pinned to whatever their current platform is (which is probably what will happen for a while, since no packages are using "platform"), and a shim is created. Every time ember is run outside of a pinned project (or in a pinned project where it is not a dependency), it will be run with that pinned platform. So even if the user changes their platform, ember will continue to work.

The way things are now outside of Notion, whenever you install a new version of node (with nvm or whatever tool you use), you have to re-install any global binaries, because they no longer work. They were only installed as part of the other node installation. So pinning them to a single static platform is necessary to prevent this "tool bitrot", which is addressed somewhere in this RFC.

That kind of leads into this as well:

Alternatively, we could even not pin the platform for a user tool at all, and simply use the current user platform whenever the tool is executed. That would open potential issues with compatibility, but I would have to guess that those are relatively rare, since that more closely matches the current status quo.

I don't think there is any way we can guarantee that a package installed using one version of Node will necessarily work when run using a different version of Node, especially between major versions. To keep user tools working as expected we have to statically pin them to a specific platform, until the user does a notion upgrade <tool> (or whatever the syntax will be).

charlespierce commented 5 years ago

@mikrostew That's reasonable about Tool Bitrot making pinning the platform a desired aspect so that the tool will always work, regardless of what else changes. I still feel like double-dipping the platform specifier, to represent both the environment that developers of a tool work in and the environment that they expect end users to work in, has some potential issues.

At the very least it's a potential source of confusion for a user: If I have my user platform set to use Node 11.6.0 and run notion install <SOME_TOOL>, and then run <SOME_TOOL> and it executes using Node 8.11.3, I would be confused about where that version came from. We then have to explain that a) The version came from the platform inside the tool's package.json and b) There's no way for the user to override it, so that tool will always use node@8.11.3 until the developers of that tool decide to change their package.json.

dherman commented 5 years ago

@mikrostew wrote:

So pinning them to a single static platform is necessary to prevent this "tool bitrot", which is addressed somewhere in this RFC.

Yeah, I believe this is an important requirement.

I don't think there is any way we can guarantee that a package installed using one version of Node will necessarily work when run using a different version of Node, especially between major versions. To keep user tools working as expected we have to statically pin them to a specific platform, until the user does a notion upgrade <tool> (or whatever the syntax will be).

I agree, and in fact I think this is part of making the choice of platform more under the user's control. When I install a tool like surge with notion install, I don't want it to start running with, say, an ancient Node platform just because I'm sitting in a project directory that happens to use that old platform.

The way I see this is:

The user should have total control over the platform version associated with a tool.
That platform version shouldn't change unless the user chooses to change it.
The tool should continue working independent of which version of Node they're using in their toolchain.
The user shouldn't have to explicitly choose the platform version, because the tool authors are better positioned to know which version of Node the tool was built for.
That defaulting should only be a convenience, and the user should always have the control to be able to override the default.

@charlespierce wrote:

I also thought of having a notion run command that would let the user specify a toolchain at execution time, i.e. notion run --node 11.6 tsc

Yes! I've had this in the back of my head but didn't write it up. I'll mention it in the RFC.

I still feel like double-dipping the platform specifier, to represent both the environment that developers of a tool work in and the environment that they expect end users to work in, has some potential issues.

I think this is a fair concern. Maybe it's worth considering an optional syntax allowing a project to specify distinct dev vs prod platforms. By default, "platform" would just mean both:

"platform": {
  "node": "11.8.0"
},

but maybe we could allow you to specify:

"devPlatform": {
  "node": "11.8.0"
},
"platform": {
  "node": "10.5.0"
}

to be consistent with naming conventions like "dependencies" and "devDependencies".

dherman commented 5 years ago

It's also good to break this down into representative scenarios for each case:

Package binary in global context: surge (CLI tool for pushing static files to a free CDN)
Package binary in local context, not a dependency: madge (a tool for visualizing module graphs of JS projects)
Package binary in local context, a dependency: tsc (TypeScript compiler)

Going through them case by case:

After I install this, I want it to keep working regardless of what version of Node I happen to have installed. (I.e., no tool bitrot.)
This tool requires modern ES6 features to run (e.g. class), but I might be using it on the modules of a project that uses an older Node. So again, I want it to keep using the version it was built to work with.
In order for this to be useful as a dependency, it must:
- be a tool that means something specific for the local project, and
- be compatible with the Node version the project uses.

charlespierce commented 5 years ago

I like the potential idea of a devPlatform, but I would imagine we also want to allow the user to specify devPlatform without a platform. It seems like the most likely use-case for wanting to specify a different development and release platform for a tool is if you want all the devs to use the same version of tooling, but want the tool to work on any version. In that case, you could specify devPlatform and not platform, so that devs working on the tool will all get the same version, but when deployed it will just use whatever version the individual user is tied to.

Regardless, however, I don't think it's a major issue or a blocker for MVP in any way, just something to keep in our minds as adoption (hopefully) grows.

charlespierce commented 5 years ago

There's also a 4th use-case, that at least early on, I suspect will be pretty common: A user with Notion installed is working on an OSS project that doesn't specify a platform. In this case, we treat it the same as the "global context" case: Using the user platform and the pinned platform for any installed tools.

However, if the installed tool is a dependency of the project (e.g. tsc), we want to execute the package-local version of that tool, using the user platform, not the pinned platform for the global version of the tool. This is because the tool was, presumably, installed by npm install or yarn running in the user platform, so any native compilation was done with the user platform, not the pinned platform.

dherman commented 5 years ago

@charlespierce

if you want all the devs to use the same version of tooling, but want the tool to work on any version.

Oh, this made me realize: we actually may want to use something more like "engines" instead of "platform" for deciding the Node runtime version to pin to an installed package.

As you say, specifying a precise version is a development concern. For deployment, you probably don’t want to have to be as precise, and you probably want to allow customers to use newer versions of Node than the particular one you chose at the time of shipping.

And that’s really what the "engines" field is about: the broadest possible range of compatible Node runtime versions you’re compatible with. So there’s really no need to invent a new field.

That also has the benefit of reusing an existing standard and not needing to work so hard to bootstrap the ecosystem.

charlespierce commented 5 years ago

@dherman I like the idea of using the "engines" field, as you say that's exactly what it's for. We can then fall back to using the platform if engines is specified or can't be parsed. And since the engines should be a semver range requirement, we already have the machinery in place for parsing and resolving those :+1:

dherman commented 5 years ago

This has been an edifying thread. I’ll update the RFC! I think this also should be enough clarity to help me update the docs for the web site too.

dherman commented 5 years ago

Follow-up from a video discussion a few of us had today: we ended up convincing ourselves we prefer to original "toolchain" syntax, instead of "platform". Here were our main points of rationale:

Brand consistency: the main concept of Notion is the toolchain, so it reinforces that one main concept.
Clarity over precision: the "well actually" precision that it doesn't specify all of the tools in the toolchain, just the platform, is just not as important as having a more intuitive name.
Distinction from "engines": the "platform" name sounds like a synonym for "engines." The name "toolchain" is less confusable with "engines."
Move to "engines" terminology: rather than introducing the term "platform," even in internal docs, we should stick to the pre-existing terminology of "engines." (We'll probably want to change the code to rename "platform" to "engine" internally.)
Possible imprecision: we should allow this field to be extensible so it can grow to specify more things in the future. There's no reason to be confident it will only ever specify the engine.

dherman commented 5 years ago

LOL, we had yet another discussion about the name of the key, and @charlespierce made the case in notion-cli/notion#412 for "volta". I'll update the RFC.

dherman commented 5 years ago

This RFC is also overdue for merging -- I updated a few bits that were out of date but otherwise it pretty much describes the current state of the design, so I'm going to merge.

volta-cli / rfcs

RFC: Toolchain #27