pushkin-consortium / pushkin

A customizable, scalable ecosystem for massive online psychological experiments
https://pushkin-consortium.github.io/pushkin/
MIT License
24 stars 10 forks source link

Publishing templates with the new monorepo structure #254

Closed jessestorbeck closed 3 months ago

jessestorbeck commented 7 months ago

A consequence of moving to a monorepo structure is that publishing templates will be less straightforward than before. Presently, each template has its own set of releases, which the CLI can target via a GitHub API url. With the monorepo, there will be one set of GitHub releases for the entire project. The solution we are pursuing is to update the CLI so pushkin install filters the JSON of release information to just the relevant templates/versions requested by the user. There need not be changes in the user's interaction with the CLI.

One decision we need to make is what exactly to include in the archive for each template release. Do we include the entire monorepo? This is probably easier from the standpoint of automating the release, but it will be a significantly larger download for the user and require us to add functionality to the CLI to throw out all files except the template of interest.

The other option is to make the release archive just the particular template. This is type of release will require some extra work to automate, but it will minimize the download size and allow us to use the current versions of getSiteTemplate/getExpTemplate in the CLI.

I personally support the latter of the two options, but I think it's important that we encourage users developing their own templates to do so in a fork of the entire monorepo, rather than creating a repo with just a single template. Contributing templates back to the main Pushkin distribution will be easier this way. Users should probably follow the release naming/tagging conventions used in the main distribution so that the JSON filtering mentioned above works for them, but as a fallback the CLI could just give them the entire list of releases to choose from.

jodeleeuw commented 7 months ago

Have you thought about publishing the template packages and using the package manager install through the cli? If you don't want to put templates on npm directly you could also host them through github package manager, but I think hosting on npm is pretty reasonable.

jessestorbeck commented 7 months ago

I think I understand -- so then the templates would be dependencies of the CLI? The thing I am struggling with is that I don't know what the templates would look like as packages. Would the idea be that we throw all the current contents of a template into an assets folder and the package exports a function that copies all the assets into the user's Pushkin site?

If I'm understanding correctly, that doesn't sound too onerous from a development perspective. My worry is that it increases the difficulty for potential contributors / anyone who wants to customize a template. Do they then need to know how to publish a package locally (or use GH packages or npm themselves)?

@jkhartshorne and I were talking about this yesterday, and our plan was to turn the templates into private packages (i.e. no real changes to the structure, just adding a package.json so we can add devDependencies). This way they would work with workspaces and changesets, but we'd still publish them with GitHub releases, as described above.

Maybe we are reinventing the wheel trying to use GitHub releases? I'd be interesting in knowing if you think one option would be easier for maintainers and/or contributors.

jodeleeuw commented 7 months ago

Would the idea be that we throw all the current contents of a template into an assets folder and the package exports a function that copies all the assets into the user's Pushkin site?

I think you can add a script in package.json scripts and then use the "postinstall" event. This could copy the files over.

My worry is that it increases the difficulty for potential contributors / anyone who wants to customize a template. Do they then need to know how to publish a package locally (or use GH packages or npm themselves)?

Yeah good question. It would only apply to people who want to share their customization. One option would be to host a contrib-style repository that makes it easy to contribute through a PR. In addition, some kind of basic install from directory could work for the general case. Then I can download any repo and install it that way.

jessestorbeck commented 6 months ago

Aha, I think I misunderstood before. Just thinking through how this would work:

A pushkin-contrib repo seems reasonable as an eventual goal once the community has grown. And we already have the option to install from path. If we are getting away from GitHub releases for the main distribution, it probably doesn't make sense to use it for community-developed templates (i.e. modifying our current install from GH repo option).

Trying to think of some pros/cons here for myself and @jkhartshorne:

jodeleeuw commented 6 months ago

We may be able to rely on features of the package manager itself rather than building them ourselves (e.g., fetching available template versions)

Yeah this is a big one I think. One of the things I've come to appreciate over the last 2-3 years of jsPsych development is that using tools that are familiar to experienced developers is really helpful for attracting contributors. If a lab goes to the trouble of building some customized template that is generally useful but it would take extra work for the developer to figure out how to share it then sharing is less likely. But if it's as simple as "just publish as npm package or submit a PR to this repo and we'll do it" then there's very little friction.

jkhartshorne commented 6 months ago

If, for some reason, the user wanted a different version of a template they had already added, you could call yarn upgrade and this would trigger the post-install script of the upgraded version.

So the code here would have to be careful to not copy over the user's customization. That's not entirely trivial.

We may be able to rely on features of the package manager itself rather than building them ourselves (e.g., fetching available template versions)

With due respect to @jodeleeuw 's experiences, I'm not convinced this applies much here. I mean, we'll be basically doing an end-run around the package manager, right? That is, as I understand it, instead of having the CLI download the zip file and unpack it and put things where they are supposed to go, the package manager will ... download the zip file and then call the CLI to unpack it and put things where they are supposed to go. I mean, for site templates we'll probably change all of 3 or 4 lines of the CLI; otherwise, it's the same code doing the same things in the same way.

In short, we use npm to download the code, but then otherwise ignore it completely. And unlike a normal npm package, the template never gets imported anywhere.

Or am I missing something?

Users will probably want to be able to add multiple copies of an experiment template to their site, so on subsequent runs of pushkin install experiment, we would check if the template was already installed and, if so, the post-install script can be called directly.

That would be OK but upgrading the template would get pretty convoluted. One option might be to use a monorepo structure. That is, each experiment is its own npm project with its own package.json. That should work, since each experiment lives in its own folder prior to running pushkin prep.

Again, I think there's some weirdness here in that we are using the package manager to download a zip file and then we basically ignore the package manager after that.

With respsect to all of the above, I guess I'm not sure what problem we are trying to solve here.

The other option is to make the release archive just the particular template. This is type of release will require some extra work to automate, but it will minimize the download size and allow us to use the current versions of getSiteTemplate/getExpTemplate in the CLI.

Just to confirm I understand the plan: each template is released separately, as is the CLI, the api, etc. But they are all released through the same monorepo. We just tag them appropriately, making it easy for the CLI to find what it needs.

I think that'll work fine. It doesn't seem that complicated.

jessestorbeck commented 6 months ago

So the code here would have to be careful to not copy over the user's customization. That's not entirely trivial.

To clarify, what I mean here is that install site will initialize a Node project in the user's site directory, and then the site template will be installed as a dependency, which triggers copying of all the normal template files into the site. Later, when they run install experiment, their chosen exp template also becomes a dependency of the site, and its template files are copied into the relevant folder in the site's experiments directory.

I agree upgrading a template dependency should not change anything pre-existing in the user's site. For site templates specifically, there will never be any need for the CLI to call yarn upgrade, as there's only ever one site template added to the user's site directory. Just in case the user did try to upgrade their site template version manually, there should be error handling in the site template's postinstall script that stops it from executing if there's already been a site template installed. This is basically the same as what install site already does when it checks that there's no pre-existing Pushkin site in the working directory.

In the case of experiment templates, the only situation where yarn upgrade is relevant is when the user wants to add a new experiment with a template they've previously used AND they request a different version than what's currently in package.json. The control flow would look something like:

if (!package.dependencies[requestedTemplate]) { // If template hasn't been used before, add it
  exec('yarn add ${requestedTemplate}@${requestedVersion} --expName=expName'); // expName arg passed to postinstall
} else if (package.dependencies[requestedTemplate] === requestedVersion) { // If same version as before, run postinstall directly
  exec('yarn run postinstall --expName=expName');
} else { // If different version from previous, upgrade the dependency (which also triggers postinstall)
  exec('yarn upgrade ${requestedTemplate}@${requestedVersion} --expName=expName');
}

One option might be to use a monorepo structure. That is, each experiment is its own npm project with its own package.json. That should work, since each experiment lives in its own folder prior to running pushkin prep.

I figured all templates (site and exp) would be top-level dependencies of the site. If a site has three experiments, each of which was created from the basic exp template, it seems redundant to download it three times for each experiment's node_modules. It still may eventually be good to have each experiment be a Node project though, so we could set up workspaces for the site (possibly beneficial for user-created tests).

Again, I think there's some weirdness here in that we are using the package manager to download a zip file and then we basically ignore the package manager after that.

With respect to all of the above, I guess I'm not sure what problem we are trying to solve here.

It would essentially be using the package manager to download the template files. But I do think there are other benefits:

I think the problem to be solved here is that the CLI needs at least some modification to deal with releasing templates from the monorepo. If we stick with using GitHub releases, we still need a way of filtering the release information so the CLI can tell users what templates/versions are available and select the right zip file url. Additionally, to automate making releases, we'll need to write a separate GitHub action that individually zips up just the updated templates.

Given that some development is required, it's a question of whether or not to do the option that requires more work up front but may make maintenance and contributions easier in the future.

jodeleeuw commented 6 months ago

There's precedent here with things like https://www.npmjs.com/package/html5-boilerplate.

Another advantage might be that you could eventually have tooling in place to help people build/test templates that would be part of the npm package and then you could run build or test commands on them. Could you do this without publishing to npm? Sure. But I think then you're just reinventing the distribution mechanism that npm provides since you already have the package.json in place.

jessestorbeck commented 5 months ago

@jodeleeuw -- Thanks for your advice on this topic! We've decided we will distribute the templates via npm, and I'm working on the implementation now.

I think it makes sense for us to publish the templates under an org scope. We were wondering, in jsPsych's case, why is everything under the @jspsych scope except jspsych itself? Is it historical or just more economical than typing @jspsych/jspysch?

jodeleeuw commented 5 months ago

I think it is primarily historical. I think @jkhartshorne published a jspsych package on npm in the early days of pushkin development and before we were doing anything with packages. Then he transferred ownership of the package once we started releasing packages via npm.