Continuously integrate into a single (primary) branch

bomanimc commented 4 years ago

@joeyklee and I talked about recently about some ideas for simplifying our branches and releases. Creating this issue that contains context, a more detailed concept for changes, and some questions. Please leave your feedback!

Current Method

Currently, ml5-library is composed of a few key branches:

development: Represents the most up-to-date changes to the core library code.
gh-pages: Docsify documentation at learn.ml5js.org builds from this branch
release: Represents the commits that are in the latest official release of the ml5.js
example-site: A temporary branch added as a step in our monorepo change from #809.

At face value, this seems like a relatively simple structure, but it presents a few challenges. Let's consider the following two general types of contributions to ml5:

Present Additions (Docs, Examples): These contributions include changes that are relevant to the currently available version of ml5.js. These could, for example, be immediate updates to the docs for a currently-in-use model or a fix for a broken example. These contributions should be able to merge and deploy without requiring a new release of the ml5 library.
Future Additions (Docs, Examples, Library): These contributions include new library features, documentation, or examples that are relevant to a future release of the library. These could, for example, be new models and the examples and docs that associate with them. These contributions should require a new release of the library to become available for public use.

Our currently ml5-library branch structure attempts to accommodate both types in the following ways:

Present Additions:
- Docs: PRs for immediate changes to the documentation (such as the recent #906) are merged into gh-pages so that they can trigger an immediate redeploy of learn.ml5js.org
- Examples: This hasn't occurred yet in this repo because examples in ml5-library are new (see #809), but presumably they'd be merged into the examples-site branch.
Future Additions:
- Changes to core library code, docs, and examples are merged into the development, where they will wait for the next library release.

This means that there is no single branch represents the most recent state of ml5 docs, examples, and core library features. When we prepare to release a new version of the library, a challenge is ensuring that all of the important changes into the release branch. This process, as you might expect, is a source for a significant number of merge conflicts. Determining how to correctly manage these conflicts is a challenge for new contributors and people who haven't worked significantly with Git. Overall, this makes the process of releasing new versions a lot less accessible than we'd like.

The problems with this (summarized) are:

No branch that represents the latest of ml5
Significant complications addressing merge conflicts during the release process
More complexity in the contribution process, since contributors need to know which branch to merge into.

Ideal State

Ideally, we'd be able to avoid much of this complexity by having a single branch for ml5. This branch would represent the latest state of the core library, docs, and examples. All contributor PRs would merge into this single branch and library owners working on new additions would open separate 'feature' branches that pull request into the main branch.

The benefits of this approach (summarized) are:

A single branch where people can go to see the latest contributions.
Completely removes the complexity of addressing merge conflicts from the library release process.
A clear, simple merge point for library contributions.

Solution Option: Gates

One of the core challenges preventing us from moving from the current state to our ideal state is the fact that, if we had a single branch for all of our library code, docs, and examples, we wouldn't have anything in place to prevent docs and examples for unreleased library features from going to our public sites. This approach for using "gates" aims to address that.

Note that these gates only apply to docs and examples. These are in-place to allow us to continuously integrate documentation and examples without revealing them on our public site.

Proposed Approach - General: We can create a new JSON file that represents each of our feature gates called gates.json. This JSON file has the following structure:

{
    "gateName": {
        "examples": ["Example_DirectoryName"],  
        "description": "This gate is an example for explaining the idea"
    }

Proposed Approach - Documentation: We'll update our Markdown renderer for Docsify to docsify-mustache. This will allow us to specify a JSON file that contains information that we can use to create conditional blocks in our documentations.

If we want to gate a whole page, we can hide it from the site by using a mustache conditional in _sidebar.md:

* **Reference** 📝
<div class="Sidebar__section-divider">&nbsp;</div>

  * [Overview](/reference/index.md)
  * **Helpers** ✨
    * [NeuralNetwork](/reference/neural-network.md)
    * [FeatureExtractor](/reference/feature-extractor.md)
    * [KNNClassifier](/reference/knn-classifier.md)
    * [kmeans](/reference/kmeans.md)
    {{#gateName}}
    * [gatedPage](/reference/gated-page.md)
    {{/gateName}}

If we want to gate a part of a page:

## Description
As written by the developers of BodyPix:

"Bodypix is an open-source machine learning model which allows for person and body-part segmentation in the browser with TensorFlow.js...

{{#gateName}}
This is a new section of the document that's currently gated.
{{/gateName}}

Proposed Approach - Examples: Each gate optionally contains a list of example directory names that specify the examples that should be hidden by the presence of this feature gate. To make this work, we could:

Update the update-examples-json.js script to get all of the ignored directory names in the gates.json file and prevent them from being included in the output file (example.json) which is used to populate the ml5 examples index page.
Prevent gated examples from being ignored in development mode, but consider adding some UI to those pages (banner, footer, etc) that tells the person making the example that it will be hidden in production,

Workflow Overview Let's imagine a case where a contributor is trying to add a new model to the library along with an associated doc and example (this is a "Future Addition" contribution type). From the contributor's perspective, contributing gated examples would mean:

Make code changes to the library
Adding the example to codebase as usual
Updating the gates.json file to include a feature gate with the directory name of the gated example.
Add a new docs file.
Add the new doc to the sidebar, using a Mustache conditional to prevent it from showing up.
Create a PR and merge it into the primary branch.

During the process of preparing a new release of the library, one would:

Look at the gates.json to determine if there are any gates that can be cleaned up from the codebase (this is where the description helps).
If any are ready to be removed, search the codebase for reference to the gate in docs and remove them. Delete the gate object in gates.json.
Commit the changes.
Continue forward with the npm publishing process.
Push changes to the main branch, which will trigger a new deployment of the example and documentation sites.

Concerns

A few aspects of this plan that I'm not quite happy with:

Naming of gates.json. Not sure is "gates" is a good name for this.
Using the example directory name to ignore gated examples. If the directory name changes (e.g. someone who isn't aware of gating changes the name), the gating would fail.

Please let me know your thoughts! I would love to brainstorm improvements to this idea or new ideas on this thread!

shiffman commented 4 years ago

@bomanimc wow, thank you for this incredibly thorough and comprehensive assessment and proposal! I like the idea and i think "gates" works pretty well, the metaphor holds up!

Question re: current release -- is the idea then that if I wanted to look at the codebase related to the latest release, there would be a commit tagged with the version number and I would just navigate back in the git history?

Another small issue could be if a new "feature" is added to an existing model, could you "gate" specific functions or properties of an existing ml5 function? For example, with the neuro-evolution examples I'm adding functions to ml5.neuralNetwork() like mutate() and crossover(). This is a fairly rare edge case so not a major element to worry about (if they appeared on the website early not a crisis!) but something that could come up.

Is it the maintainers responsibility to update gates.json when a pull request is merged or are contributors expected to do so? Certainly it's a team effort, just wondering your general expectations!

bomanimc commented 4 years ago

Thanks for the feedback @shiffman!

Question re: current release -- is the idea then that if I wanted to look at the codebase related to the latest release, there would be a commit tagged with the version number and I would just navigate back in the git history?

Yes! A part of our current release process includes tagging the latest and creating a release on GitHub (https://github.com/ml5js/ml5-library/releases), so we'll be able to use tags as you mentioned.

Another small issue could be if a new "feature" is added to an existing model, could you "gate" specific functions or properties of an existing ml5 function? For example, with the neuro-evolution examples I'm adding functions to ml5.neuralNetwork() like mutate() and crossover(). This is a fairly rare edge case so not a major element to worry about (if they appeared on the website early not a crisis!) but something that could come up.

Hmm. I don't have any ideas yet for gating model features, but my original thought here is that we'd allow the new model features to merge into our primary branch and then attempt to gate the new documentation related to the feature, but your comment helps me to realize that we could also experience some issues related to examples. For example, if we change a model's API and we need to update the way the implementation in the example, we wouldn't have a great way of doing that at the moment. I'll think about this a bit more!

Is it the maintainers responsibility to update gates.json when a pull request is merged or are contributors expected to do so? Certainly it's a team effort, just wondering your general expectations!

I think it would be good to have enough documentation in place so that contributors could do this (and so maintainers can link to the docs if a contributor forgets to add this on a PR). Looking at the commit history, it seems like the number of cases where people who aren't maintainers are opening PRs for "Future Addition" contributions is actually quite low. If things continue, we can expect that most users of gates will be maintainers, but I think it'd be great for this to be a simple system since it'd be great to have more "Future Addition"-type contributions from people who aren't maintainers!

bomanimc commented 4 years ago

@joeyklee what is the typical approach to making breaking changes to models in ml5? How often does it happen? A case I'm considering is what the steps would be if we want to change remove a function or changes its name in a manner that would effect an example.

joeyklee commented 4 years ago

Hi All! Hi @bomanimc,

So far we've not had many instances in which breaking changes were introduced as far as I can remember (though I could be wrong!). I think the latest major change had to do with the integration of YOLO into the ObjectDetector class.

A case I'm considering is what the steps would be if we want to change remove a function or changes its name in a manner that would effect an example.

This is a good point since our examples and docs may be updated at different times than our library releases. Hmm. I'm going to have to think about this a bit more.

Assuming we had comprehensive tests that included testing our examples (at the moment we do not), then essentially what would happen is that any examples that did not match the current state of the API would not pass. With this in mind, I can imagine this would give us something to work towards and also help guide our decision making about the best way to handle the library development.

This is an aside, but I think maybe it is a good time to do a documentation sprint to get all our docs up to date following the best JSDoc practices. This would be a larger effort, but how do you feel about in-line examples in the code?

bomanimc commented 4 years ago

Coming back around to this issue!

After thinking about this more, I think it's been hard to move forward on this because we've been thinking about this as a monolithic task rather than something that we can sequence in smaller steps. Overall, I think @joeyklee and I both agree that it might be best for us to chunk this into smaller bits.

Here's a proposal for a possible Stage 1 of this project:

Stage 1

Todos for Complete Stage 1

(DONE) Switch to Netlify for deploying our documentation so that we don't need a gh-pages branch.
(IDONE) Finish all required features for the next release.
(DONE) Rename development to main.
(DONE) Update all references to the development branch across the codebase (i.e. links to source code files, contributing docs, etc)
(DONE) Let's do a new release of the library ASAP and integrate all of the open PRs and features we want into the main branch.
(DONE) Publish the next release of the library from the main branch.
(DONE) Update our examples, docs, and any other peripheral site deployment chains to deploy the main branch. At this point, all of our public interfaces (library, examples, docs, etc) should be oriented towards the main branch.
Delete all of the other branches (i.e. master, release, example-sit, etc).

Post-Completion Workflow

At this point, we'd operate with the following approach:

We have a single main branch. The expectation is that any changes to the main branch should be deployed/released as soon as they are merged. There should not be a large time period between when code is merged into main and when it is deployed/release.
If we approve a PR that we don't want to deploy/release, we should mark it with a NEEDS RELEASE tag and leave it open. Almost all of the PRs that we receive from the community are for changes to docs/examples that we can merge immediately into main and deploy to our peripheral sites without needing to do a full library version update (i.e. typo fixes, example bugs, etc).
The person coordinating the release should (locally) merge and test all of the open PRS marked with NEEDS RELEASE before merging these PRs into main and releasing a new update to the library.
If we have longer-term, more complex features (e.g. node support), we should open a new feature branch. Multiple people can submit PRs to this branch as needed. Once the primary feature branch is ready, we can merge the whole thing into main and release it.

Post-Completion Workflow Example

For something like the handpose model integration, the process (after Stage 1) would look something like:

Open a new handpose feature branch
Make separate PRs for docs, tests, examples, etc that all merge into that handpose branch. These PRs can be reviewed as usual and merged into the handpose branch.
Once everything is ready to go for Handpose, we can do a final review pass on the handpose branch, merge the whole thing into main and then do a new release of the library.

Stage 2

This is where we could consider the idea of adding things like the Gates proposal from the initial issue description. That said, I'd love to see how things shake out after Stage 1. We may not actually need to do this.

What do you all think?

bomanimc commented 3 years ago

All done with the integration! Branches have been tagged (archive/*) and deleted! There are still some threads in the discussion here that we may need to pick up later, but for now I'm going to close this issue.

ml5js / ml5-library