mozilla-releng / shipit

Ship It API & Frontend
10 stars 26 forks source link

Add the Merge Day Automation to ShipIt #1418

Open gabrielBusta opened 7 months ago

gabrielBusta commented 7 months ago

Summary

The enhancement of ShipIt through the integration of our "Merge Day Automation" is an opportunity for us to streamline our release process. ShipIt, being a Taskcluster client, already facilitates the triggering of Taskcluster hooks. We should be able to incorporate the merge automation action in a similar way to the release promotion action.

Objectives

Self-Served Merges for Release Managers: ShipIt's sign-off system offers a controlled environment for release managers. By enabling them to self-serve the Firefox train merges through this system, we can decrease the number of merge-related hand-offs, making the release cycle more efficient.

Simplification of Merge Duty: The current incarnation of "Merge Duty" involves release engineering intervention and some coordination among Firefox release teams. Automating this process will simplify these duties, allowing the team to focus on strategic tasks.

Questions

References

b.m.o [meta] bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1890753

ahal commented 7 months ago

Reimagining Merge Automation Behaviors as Products

I don't love having behaviours as products. Then you'd need to create a new "release" for each action. This seems like it would be more work than the current trigger in Treeherder approach.

What if there were a single product (called Merge Automation -> Gecko or similar). Then the behaviours were mapped to phases? So phases could be like:

  1. beta -> release
  2. central-> beta
  3. bump-central
  4. bump-esr

(we could arguably put the simulation runs in there too). Then you only need to create a single "merge release" each cycle, and you could track what phase it was at over time.

ahal commented 7 months ago

Should we introduce new terminology such as "workflows" to better describe the merge process within ShipIt? Can new terms enhance clarity/understanding?

Imo phases works here too.. we just shouldn't call them promote / ship etc

gabrielBusta commented 7 months ago

I was playing around with this idea.

Here's what my tinkering looks like :)

Screenshot 2024-04-08 at 12 58 40 Screenshot 2024-04-08 at 14 34 09
gabrielBusta commented 7 months ago

Thought: so far "Releases" maps to release-promotion - we could have a "Merges" that maps to merge-automation?

gabrielBusta commented 7 months ago

Or, the new release form could change a bit if the selected product is merge-automation. We can treat the merges as an edge case - like the RCs.

bhearsum commented 7 months ago

I don't have strong feelings on any of this, but here are some thoughts:

I was playing around with this idea.

* I am not sure how version/build # would fit in

Version could make sense if we tied each merge automation to a branch. Eg: one for central (that would do central -> beta), one for beta (does beta -> release), one for esr. I can't decide if it's a good or bad idea to keep them all as one "release" or not. (This might remove the need for answers to some of the later questions, too.)

* Build # doesn't seem applicable?

Most of the time...no. But it will become relevant if we have a bug in merge automation that we have to fix, and then restart the merges. And on the topic...what happens if we find a bug halfway through (eg: after pushing some changes but not finishing the last the steps)? How do we recover from that?

* I don't know about the version either. I just pointed it at the beta version to get something to render

* These behaviors cut across the repository/branch line, where does the version come from?

If we stick with a model where all merges are part of one "release" I strongly encourage not using a version that is tied to any one in particular - that would enevitably lead to confusion.

Overall, I do question whether or not it makes sense to try to fit merge automation into the mold of a "release". It may be cleaner to create a new table or tables that fit its model much better. (I think we could still make use of much of the UI, phases, and other helpers though.)

gabrielBusta commented 7 months ago

Thanks for chiming in Ben. I'm inclined to agree that integrating the merge-automation into our current Release model might complicate things. I like the idea of developing new model(s) designed for the merge-automation action schema. I plan to delve into this idea further, focusing on how such a model would mesh with our current models (such as Phases), React components, and other helpers.

ahal commented 7 months ago

Yeah I agree this seems like the proper approach.

One suggestion is rather than making a model designed specifically for merge-automation, why not try to make a model designed to be generic? Who knows what other things we might want to add in the future?

When it comes down to it, there is a routine we need to perform. For each routine we need to specify:

  1. Some arbitrary metadata to display in the UI (with ability to link)
  2. A series of steps (or phases)

Then for each step, we need to specify:

  1. An action to trigger
  2. Some input for said action

And that's pretty much it. If we can keep our model as simple as that, it should be enough to support merge automation, and any other action based workflows we can come up with down the road.

gabrielBusta commented 7 months ago

Great thinking @ahal - I like it. I will be trying it out for sure. The input could be stored in Postgres' JSON type

gabrielBusta commented 7 months ago

Here's a possible schema for generic shipit "workflows" (names are hard 😛)

Enum - shipit_api_workflow_status

Table - shipit_api_workflows

Table - shipit_api_workflow_phases

Table - shipit_api_workflow_phase_signoffs

This is exactly the schema we have now for releases, but for "workflows."

gabrielBusta commented 7 months ago

Here's what the models might look like:

import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import ENUM
from backend_common.db import db
import datetime

class Workflow(db.Model):
    __tablename__ = "shipit_api_workflows"
    id = sa.Column(sa.Integer, primary_key=True)
    attributes = sa.Column(sa.JSON, nullable=False)
    name = sa.Column(sa.String(80), nullable=False, unique=True)
    status = sa.Column(ENUM("scheduled", "completed", "aborted", name="shipit_api_workflow_status"), nullable=False)
    created = sa.Column(sa.DateTime, nullable=False, default=datetime.datetime.utcnow)
    completed = sa.Column(sa.DateTime)
    phases = sa.orm.relationship("Phase", order_by=Phase.id, back_populates="workflow")

    def __init__(self, attributes, status):
        self.name = self.construct_name(attributes)
        self.attributes = attributes
        self.status = status

    def construct_name(self, attributes):
        raise NotImplementedError("Subclasses must implement this method to construct the workflow's name")

    def phase_signoffs(self, phase):
        raise NotImplementedError("Subclasses must implement this method to get signoffs for a phase")

    @property
    def allow_phase_skipping(self):
        # This only affects the frontend, *the API doesn't enforce phase skipping.*
        return False

    @property
    def json(self):
        return {
            "name": self.name,
            "status": self.status,
            "attributes": self.attributes,
            "created": self.created.isoformat(),
            "completed": self.completed.isoformat() if self.completed else "",
            "phases": [p.json for p in self.phases],
            "allow_phase_skipping": self.allow_phase_skipping,
        }

class Phase(db.Model):
    __tablename__ = "shipit_api_workflow_phases"
    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String, nullable=False)
    submitted = sa.Column(sa.Boolean, nullable=False, default=False)
    task_id = sa.Column(sa.String, nullable=False)
    task = sa.Column(sa.JSON, nullable=False)
    context = sa.Column(sa.JSON, nullable=False)
    created = sa.Column(sa.DateTime, default=datetime.datetime.utcnow)
    completed = sa.Column(sa.DateTime)
    completed_by = sa.Column(sa.String)
    workflow_id = sa.Column(sa.Integer, sa.ForeignKey("shipit_api_workflows.id"))
    workflow = sqlalchemy.orm.relationship("Workflow", back_populates="phases")
    signoffs = sqlalchemy.orm.relationship("Signoff", order_by=Signoff.id, back_populates="phase")

class Release(Workflow):
    product_details_enabled = True

    def __init__(self, attributes, status):
        super().__init__(attributes, status)

    def construct_name(self, attributes):
        return f"{attributes['product'].capitalize()}-{attributes['version']}-build{attributes['build_number']}"

    @property
    def allow_phase_skipping(self):
        # This only affects the frontend, the API doesn't enforce phase skipping.
        product = ALLOW_PHASE_SKIPPING.get(self.product, {})
        return product.get(self.project, product.get("default", False))

    def phase_signoffs(self, phase):
        return [
            Signoff(uid=slugid.nice(), name=req["name"], description=req["description"], permissions=req["permissions"])
            for req in SIGNOFFS.get(self.product, {}).get(phase, [])
        ]

    @property
    def project(self):
        return self.attributes["branch"].split("/")[-1]

class XPIRelease(Workflow):
    product_details_enabled = False
    product = "xpi"

    def __init__(self, attributes, status):
        super().__init__(attributes, status)

    def construct_name(self, attributes):
        return f"{attributes['xpi_name']}-{attributes['xpi_version']}-build{attributes['build_number']}"

    def phase_signoffs(self, phase):
        return [
            XPISignoff(uid=slugid.nice(), name=req["name"], description=req["description"], permissions=req["permissions"])
            for req in SIGNOFFS.get("xpi", {}).get(self.attributes["xpi_type"], {}).get(phase, [])
        ]
gabrielBusta commented 7 months ago

Using a JSON column for storing "workflow attributes" that vary between different "workflow types" is more flexible and simplifies the database schema. It would allow us to add new workflows, or modify existing ones, without needing schema migrations. Some pros and cons of this approach:

pros

cons

gabrielBusta commented 7 months ago

Our other option is to continue using the "'abstract' base class and subclass with table" pattern but introduce a generic workflow type. Our current pattern is more modular: each subclass can has its own database table, which allows for tailored schemas that fit each workflow type. It is also better constrained because the schema is enforced at the database layer. But, each subclass having its own table adds complexity to the database schema. We might have more tables than necessary since the differences between our current workflows (Release/XPIRelease) seem minor. Another advantage of using this pattern is it's similarity to the existing setup - that could speed up implementation.

bhearsum commented 7 months ago

I haven't had a chance to take this in fully yet - but a couple of quick thoughts:

On this point specifically:

If new "attributes" are introduced, we can add them to the JSON object being persisted without impacting existing records or requiring database updates.

I'm not sure I agree this is a pro. Ultimately whatever code is using these records needs to know that older versions of them do not have fields that were added later, so whether it's a field in a json object that may not exist, or a nullable column, the difference in structure must be handled.

I will try to review this in more depth soon, but please don't block on me - I think this is very much on the right track.