ml5js / ml5-library

Friendly machine learning for the web! 🤖
https://ml5js.org
Other
6.46k stars 904 forks source link

Continuously integrate into a single (primary) branch #917

Closed bomanimc closed 3 years ago

bomanimc commented 4 years ago

@joeyklee and I talked about recently about some ideas for simplifying our branches and releases. Creating this issue that contains context, a more detailed concept for changes, and some questions. Please leave your feedback!

Current Method

Currently, ml5-library is composed of a few key branches:

At face value, this seems like a relatively simple structure, but it presents a few challenges. Let's consider the following two general types of contributions to ml5:

Our currently ml5-library branch structure attempts to accommodate both types in the following ways:

This means that there is no single branch represents the most recent state of ml5 docs, examples, and core library features. When we prepare to release a new version of the library, a challenge is ensuring that all of the important changes into the release branch. This process, as you might expect, is a source for a significant number of merge conflicts. Determining how to correctly manage these conflicts is a challenge for new contributors and people who haven't worked significantly with Git. Overall, this makes the process of releasing new versions a lot less accessible than we'd like.

The problems with this (summarized) are:

Ideal State

Ideally, we'd be able to avoid much of this complexity by having a single branch for ml5. This branch would represent the latest state of the core library, docs, and examples. All contributor PRs would merge into this single branch and library owners working on new additions would open separate 'feature' branches that pull request into the main branch.

The benefits of this approach (summarized) are:

Solution Option: Gates

One of the core challenges preventing us from moving from the current state to our ideal state is the fact that, if we had a single branch for all of our library code, docs, and examples, we wouldn't have anything in place to prevent docs and examples for unreleased library features from going to our public sites. This approach for using "gates" aims to address that.

Note that these gates only apply to docs and examples. These are in-place to allow us to continuously integrate documentation and examples without revealing them on our public site.

Proposed Approach - General: We can create a new JSON file that represents each of our feature gates called gates.json. This JSON file has the following structure:

{
    "gateName": {
        "examples": ["Example_DirectoryName"],  
        "description": "This gate is an example for explaining the idea"
    }

Proposed Approach - Documentation: We'll update our Markdown renderer for Docsify to docsify-mustache. This will allow us to specify a JSON file that contains information that we can use to create conditional blocks in our documentations.

If we want to gate a whole page, we can hide it from the site by using a mustache conditional in _sidebar.md:

* **Reference** 📝
<div class="Sidebar__section-divider">&nbsp;</div>

  * [Overview](/reference/index.md)
  * **Helpers** ✨
    * [NeuralNetwork](/reference/neural-network.md)
    * [FeatureExtractor](/reference/feature-extractor.md)
    * [KNNClassifier](/reference/knn-classifier.md)
    * [kmeans](/reference/kmeans.md)
    {{#gateName}}
    * [gatedPage](/reference/gated-page.md)
    {{/gateName}}

If we want to gate a part of a page:

## Description
As written by the developers of BodyPix:

"Bodypix is an open-source machine learning model which allows for person and body-part segmentation in the browser with TensorFlow.js...

{{#gateName}}
This is a new section of the document that's currently gated.
{{/gateName}}

Proposed Approach - Examples: Each gate optionally contains a list of example directory names that specify the examples that should be hidden by the presence of this feature gate. To make this work, we could:

Workflow Overview Let's imagine a case where a contributor is trying to add a new model to the library along with an associated doc and example (this is a "Future Addition" contribution type). From the contributor's perspective, contributing gated examples would mean:

During the process of preparing a new release of the library, one would:

Concerns

A few aspects of this plan that I'm not quite happy with:

Please let me know your thoughts! I would love to brainstorm improvements to this idea or new ideas on this thread!

shiffman commented 4 years ago

@bomanimc wow, thank you for this incredibly thorough and comprehensive assessment and proposal! I like the idea and i think "gates" works pretty well, the metaphor holds up!

Question re: current release -- is the idea then that if I wanted to look at the codebase related to the latest release, there would be a commit tagged with the version number and I would just navigate back in the git history?

Another small issue could be if a new "feature" is added to an existing model, could you "gate" specific functions or properties of an existing ml5 function? For example, with the neuro-evolution examples I'm adding functions to ml5.neuralNetwork() like mutate() and crossover(). This is a fairly rare edge case so not a major element to worry about (if they appeared on the website early not a crisis!) but something that could come up.

Is it the maintainers responsibility to update gates.json when a pull request is merged or are contributors expected to do so? Certainly it's a team effort, just wondering your general expectations!

bomanimc commented 4 years ago

Thanks for the feedback @shiffman!

Question re: current release -- is the idea then that if I wanted to look at the codebase related to the latest release, there would be a commit tagged with the version number and I would just navigate back in the git history?

Yes! A part of our current release process includes tagging the latest and creating a release on GitHub (https://github.com/ml5js/ml5-library/releases), so we'll be able to use tags as you mentioned.

Another small issue could be if a new "feature" is added to an existing model, could you "gate" specific functions or properties of an existing ml5 function? For example, with the neuro-evolution examples I'm adding functions to ml5.neuralNetwork() like mutate() and crossover(). This is a fairly rare edge case so not a major element to worry about (if they appeared on the website early not a crisis!) but something that could come up.

Hmm. I don't have any ideas yet for gating model features, but my original thought here is that we'd allow the new model features to merge into our primary branch and then attempt to gate the new documentation related to the feature, but your comment helps me to realize that we could also experience some issues related to examples. For example, if we change a model's API and we need to update the way the implementation in the example, we wouldn't have a great way of doing that at the moment. I'll think about this a bit more!

Is it the maintainers responsibility to update gates.json when a pull request is merged or are contributors expected to do so? Certainly it's a team effort, just wondering your general expectations!

I think it would be good to have enough documentation in place so that contributors could do this (and so maintainers can link to the docs if a contributor forgets to add this on a PR). Looking at the commit history, it seems like the number of cases where people who aren't maintainers are opening PRs for "Future Addition" contributions is actually quite low. If things continue, we can expect that most users of gates will be maintainers, but I think it'd be great for this to be a simple system since it'd be great to have more "Future Addition"-type contributions from people who aren't maintainers!

bomanimc commented 4 years ago

@joeyklee what is the typical approach to making breaking changes to models in ml5? How often does it happen? A case I'm considering is what the steps would be if we want to change remove a function or changes its name in a manner that would effect an example.

joeyklee commented 4 years ago

Hi All! Hi @bomanimc,

So far we've not had many instances in which breaking changes were introduced as far as I can remember (though I could be wrong!). I think the latest major change had to do with the integration of YOLO into the ObjectDetector class.

A case I'm considering is what the steps would be if we want to change remove a function or changes its name in a manner that would effect an example.

  • This is a good point since our examples and docs may be updated at different times than our library releases. Hmm. I'm going to have to think about this a bit more.
  • Assuming we had comprehensive tests that included testing our examples (at the moment we do not), then essentially what would happen is that any examples that did not match the current state of the API would not pass. With this in mind, I can imagine this would give us something to work towards and also help guide our decision making about the best way to handle the library development.

This is an aside, but I think maybe it is a good time to do a documentation sprint to get all our docs up to date following the best JSDoc practices. This would be a larger effort, but how do you feel about in-line examples in the code?

bomanimc commented 4 years ago

Coming back around to this issue!

After thinking about this more, I think it's been hard to move forward on this because we've been thinking about this as a monolithic task rather than something that we can sequence in smaller steps. Overall, I think @joeyklee and I both agree that it might be best for us to chunk this into smaller bits.

Here's a proposal for a possible Stage 1 of this project:

Stage 1

Todos for Complete Stage 1

Post-Completion Workflow

At this point, we'd operate with the following approach:

Post-Completion Workflow Example

For something like the handpose model integration, the process (after Stage 1) would look something like:

Stage 2

This is where we could consider the idea of adding things like the Gates proposal from the initial issue description. That said, I'd love to see how things shake out after Stage 1. We may not actually need to do this.


What do you all think?

bomanimc commented 3 years ago

All done with the integration! Branches have been tagged (archive/*) and deleted! There are still some threads in the discussion here that we may need to pick up later, but for now I'm going to close this issue.