Add the Merge Day Automation to ShipIt

gabrielBusta commented 7 months ago

Summary

The enhancement of ShipIt through the integration of our "Merge Day Automation" is an opportunity for us to streamline our release process. ShipIt, being a Taskcluster client, already facilitates the triggering of Taskcluster hooks. We should be able to incorporate the merge automation action in a similar way to the release promotion action.

Objectives

Self-Served Merges for Release Managers: ShipIt's sign-off system offers a controlled environment for release managers. By enabling them to self-serve the Firefox train merges through this system, we can decrease the number of merge-related hand-offs, making the release cycle more efficient.

Simplification of Merge Duty: The current incarnation of "Merge Duty" involves release engineering intervention and some coordination among Firefox release teams. Automating this process will simplify these duties, allowing the team to focus on strategic tasks.

Questions

How can we make this integration intuitive? How do we align this new feature with the existing "product/release" paradigm within our system? (do we?)
Reimagining Merge Automation Behaviors as Products
- Maybe our "merge automation behaviors" could be "products"? For example we could:
  1. Click on the "Releases" drop-down
  2. Click "Firefox" then "New"
  3. Select "Merge Central to Beta" from the "product" dropdown
Should we introduce new terminology such as "workflows" to better describe the merge process within ShipIt? Can new terms enhance clarity/understanding?

References

b.m.o [meta] bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1890753

ahal commented 7 months ago

Reimagining Merge Automation Behaviors as Products

I don't love having behaviours as products. Then you'd need to create a new "release" for each action. This seems like it would be more work than the current trigger in Treeherder approach.

What if there were a single product (called Merge Automation -> Gecko or similar). Then the behaviours were mapped to phases? So phases could be like:

beta -> release
central-> beta
bump-central
bump-esr

(we could arguably put the simulation runs in there too). Then you only need to create a single "merge release" each cycle, and you could track what phase it was at over time.

ahal commented 7 months ago

Should we introduce new terminology such as "workflows" to better describe the merge process within ShipIt? Can new terms enhance clarity/understanding?

Imo phases works here too.. we just shouldn't call them promote / ship etc

gabrielBusta commented 7 months ago

I was playing around with this idea.

I am not sure how version/build # would fit in
Build # doesn't seem applicable?
I don't know about the version either. I just pointed it at the beta version to get something to render
These behaviors cut across the repository/branch line, where does the version come from?
Does it make sense to pick a revision for these "Merge Automation" release?
Things are getting a bit awkward around here
Does the bumping make sense? Also, the input for release promotion has a different schema to merge automation

Here's what my tinkering looks like :)

gabrielBusta commented 7 months ago

Thought: so far "Releases" maps to release-promotion - we could have a "Merges" that maps to merge-automation?

gabrielBusta commented 7 months ago

Or, the new release form could change a bit if the selected product is merge-automation. We can treat the merges as an edge case - like the RCs.

bhearsum commented 7 months ago

I don't have strong feelings on any of this, but here are some thoughts:

I was playing around with this idea.
* I am not sure how version/build # would fit in

Version could make sense if we tied each merge automation to a branch. Eg: one for central (that would do central -> beta), one for beta (does beta -> release), one for esr. I can't decide if it's a good or bad idea to keep them all as one "release" or not. (This might remove the need for answers to some of the later questions, too.)

* Build # doesn't seem applicable?

Most of the time...no. But it will become relevant if we have a bug in merge automation that we have to fix, and then restart the merges. And on the topic...what happens if we find a bug halfway through (eg: after pushing some changes but not finishing the last the steps)? How do we recover from that?

* I don't know about the version either. I just pointed it at the beta version to get something to render

* These behaviors cut across the repository/branch line, where does the version come from?

If we stick with a model where all merges are part of one "release" I strongly encourage not using a version that is tied to any one in particular - that would enevitably lead to confusion.

Overall, I do question whether or not it makes sense to try to fit merge automation into the mold of a "release". It may be cleaner to create a new table or tables that fit its model much better. (I think we could still make use of much of the UI, phases, and other helpers though.)

gabrielBusta commented 7 months ago

Thanks for chiming in Ben. I'm inclined to agree that integrating the merge-automation into our current Release model might complicate things. I like the idea of developing new model(s) designed for the merge-automation action schema. I plan to delve into this idea further, focusing on how such a model would mesh with our current models (such as Phases), React components, and other helpers.

ahal commented 7 months ago

Yeah I agree this seems like the proper approach.

One suggestion is rather than making a model designed specifically for merge-automation, why not try to make a model designed to be generic? Who knows what other things we might want to add in the future?

When it comes down to it, there is a routine we need to perform. For each routine we need to specify:

Some arbitrary metadata to display in the UI (with ability to link)
A series of steps (or phases)

Then for each step, we need to specify:

An action to trigger
Some input for said action

And that's pretty much it. If we can keep our model as simple as that, it should be enough to support merge automation, and any other action based workflows we can come up with down the road.

gabrielBusta commented 7 months ago

Great thinking @ahal - I like it. I will be trying it out for sure. The input could be stored in Postgres' JSON type

gabrielBusta commented 7 months ago

Here's a possible schema for generic shipit "workflows" (names are hard 😛)

Enum - `shipit_api_workflow_status`

Elements: scheduled, completed, aborted.

Table - `shipit_api_workflows`

Columns:
- id: Primary key.
- attributes: JSON, stores various attributes for a workflow. This can be used to store things such as build ids, xpi meta-data, revisions, etc.
- name: Text - must be unique.
- status: Enum type shipit_api_workflow_status.
- created: Timestamp - defaults to the current timestamp.
- completed: Timestamp.
Indexes:
- We could possibly index this table by status/created to speed up the front-end.
Foreign Key Relationships:
- Referenced by shipit_api_workflow_phases via the workflow_id.

Table - `shipit_api_workflow_phases`

Columns:
- id: Primary key.
- name: Text.
- submitted: Boolean.
- task_id: Text.
- task: JSON - stores task details (i.e. the hook_id and hook_payload.)
- context: JSON - stores context information (i.e. action input/taskgraph parameters.)
- created: Timestamp - defaults to the current timestamp.
- completed: Timestamp.
- completed_by: Text - indicates who completed the phase.
- workflow_id: foreign key referencing shipit_api_workflows(id).
Foreign Key Relationships:
- workflow_id references shipit_api_workflows(id).
- Referenced by shipit_api_workflow_phase_signoffs via the phase_id.

Table - `shipit_api_workflow_phase_signoffs`

This is exactly the schema we have now for releases, but for "workflows."

gabrielBusta commented 7 months ago

Here's what the models might look like:

import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import ENUM
from backend_common.db import db
import datetime

class Workflow(db.Model):
    __tablename__ = "shipit_api_workflows"
    id = sa.Column(sa.Integer, primary_key=True)
    attributes = sa.Column(sa.JSON, nullable=False)
    name = sa.Column(sa.String(80), nullable=False, unique=True)
    status = sa.Column(ENUM("scheduled", "completed", "aborted", name="shipit_api_workflow_status"), nullable=False)
    created = sa.Column(sa.DateTime, nullable=False, default=datetime.datetime.utcnow)
    completed = sa.Column(sa.DateTime)
    phases = sa.orm.relationship("Phase", order_by=Phase.id, back_populates="workflow")

    def __init__(self, attributes, status):
        self.name = self.construct_name(attributes)
        self.attributes = attributes
        self.status = status

    def construct_name(self, attributes):
        raise NotImplementedError("Subclasses must implement this method to construct the workflow's name")

    def phase_signoffs(self, phase):
        raise NotImplementedError("Subclasses must implement this method to get signoffs for a phase")

    @property
    def allow_phase_skipping(self):
        # This only affects the frontend, *the API doesn't enforce phase skipping.*
        return False

    @property
    def json(self):
        return {
            "name": self.name,
            "status": self.status,
            "attributes": self.attributes,
            "created": self.created.isoformat(),
            "completed": self.completed.isoformat() if self.completed else "",
            "phases": [p.json for p in self.phases],
            "allow_phase_skipping": self.allow_phase_skipping,
        }

class Phase(db.Model):
    __tablename__ = "shipit_api_workflow_phases"
    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String, nullable=False)
    submitted = sa.Column(sa.Boolean, nullable=False, default=False)
    task_id = sa.Column(sa.String, nullable=False)
    task = sa.Column(sa.JSON, nullable=False)
    context = sa.Column(sa.JSON, nullable=False)
    created = sa.Column(sa.DateTime, default=datetime.datetime.utcnow)
    completed = sa.Column(sa.DateTime)
    completed_by = sa.Column(sa.String)
    workflow_id = sa.Column(sa.Integer, sa.ForeignKey("shipit_api_workflows.id"))
    workflow = sqlalchemy.orm.relationship("Workflow", back_populates="phases")
    signoffs = sqlalchemy.orm.relationship("Signoff", order_by=Signoff.id, back_populates="phase")

class Release(Workflow):
    product_details_enabled = True

    def __init__(self, attributes, status):
        super().__init__(attributes, status)

    def construct_name(self, attributes):
        return f"{attributes['product'].capitalize()}-{attributes['version']}-build{attributes['build_number']}"

    @property
    def allow_phase_skipping(self):
        # This only affects the frontend, the API doesn't enforce phase skipping.
        product = ALLOW_PHASE_SKIPPING.get(self.product, {})
        return product.get(self.project, product.get("default", False))

    def phase_signoffs(self, phase):
        return [
            Signoff(uid=slugid.nice(), name=req["name"], description=req["description"], permissions=req["permissions"])
            for req in SIGNOFFS.get(self.product, {}).get(phase, [])
        ]

    @property
    def project(self):
        return self.attributes["branch"].split("/")[-1]

class XPIRelease(Workflow):
    product_details_enabled = False
    product = "xpi"

    def __init__(self, attributes, status):
        super().__init__(attributes, status)

    def construct_name(self, attributes):
        return f"{attributes['xpi_name']}-{attributes['xpi_version']}-build{attributes['build_number']}"

    def phase_signoffs(self, phase):
        return [
            XPISignoff(uid=slugid.nice(), name=req["name"], description=req["description"], permissions=req["permissions"])
            for req in SIGNOFFS.get("xpi", {}).get(self.attributes["xpi_type"], {}).get(phase, [])
        ]

gabrielBusta commented 7 months ago

Using a JSON column for storing "workflow attributes" that vary between different "workflow types" is more flexible and simplifies the database schema. It would allow us to add new workflows, or modify existing ones, without needing schema migrations. Some pros and cons of this approach:

pros

We can store any attributes for the workflow within the "attributes" column decreasing the need for altering the database schema and doing migrations.
If new "attributes" are introduced, we can add them to the JSON object being persisted without impacting existing records or requiring database updates.
Development can be faster because adding new features that require storing new "attributes" doesn't require changes to the database schema.
Reduces the complexity of the database schema by eliminating the need for multiple repetitive tables and columns.

cons

Any integrity checks or validation of these "attributes" must be enforced in the application layer. There's no "attributes" schema enforced by the database.
Queries involving data within the JSON in a JSON field can be less efficient.
Indexing JSON fields can be less efficient.

gabrielBusta commented 7 months ago

Our other option is to continue using the "'abstract' base class and subclass with table" pattern but introduce a generic workflow type. Our current pattern is more modular: each subclass can has its own database table, which allows for tailored schemas that fit each workflow type. It is also better constrained because the schema is enforced at the database layer. But, each subclass having its own table adds complexity to the database schema. We might have more tables than necessary since the differences between our current workflows (Release/XPIRelease) seem minor. Another advantage of using this pattern is it's similarity to the existing setup - that could speed up implementation.

bhearsum commented 7 months ago

I haven't had a chance to take this in fully yet - but a couple of quick thoughts:

I wouldn't stress at all about performance. We measure writes in single or maybe double digits per day, and reads are probably in the hundreds per day at most. Even with no indexes I doubt we will run into performance issues in the next decade.
I'm OK with using JSON columns, but if we do that, I would very much like to make sure we don't lose validation of data by turning into an typed or unschemed dict when in Python. jsonschema or having a fully fleshed out type for it are fine alternatives IMO.

On this point specifically:

If new "attributes" are introduced, we can add them to the JSON object being persisted without impacting existing records or requiring database updates.

I'm not sure I agree this is a pro. Ultimately whatever code is using these records needs to know that older versions of them do not have fields that were added later, so whether it's a field in a json object that may not exist, or a nullable column, the difference in structure must be handled.

I will try to review this in more depth soon, but please don't block on me - I think this is very much on the right track.

mozilla-releng / shipit