Open gabrielBusta opened 7 months ago
Reimagining Merge Automation Behaviors as Products
I don't love having behaviours as products. Then you'd need to create a new "release" for each action. This seems like it would be more work than the current trigger in Treeherder approach.
What if there were a single product (called Merge Automation -> Gecko
or similar). Then the behaviours were mapped to phases? So phases could be like:
(we could arguably put the simulation runs in there too). Then you only need to create a single "merge release" each cycle, and you could track what phase it was at over time.
Should we introduce new terminology such as "workflows" to better describe the merge process within ShipIt? Can new terms enhance clarity/understanding?
Imo phases works here too.. we just shouldn't call them promote / ship etc
I was playing around with this idea.
Here's what my tinkering looks like :)
Thought: so far "Releases" maps to release-promotion
- we could have a "Merges" that maps to merge-automation
?
Or, the new release form could change a bit if the selected product is merge-automation
. We can treat the merges as an edge case - like the RCs.
I don't have strong feelings on any of this, but here are some thoughts:
I was playing around with this idea.
* I am not sure how version/build # would fit in
Version could make sense if we tied each merge automation to a branch. Eg: one for central (that would do central -> beta), one for beta (does beta -> release), one for esr. I can't decide if it's a good or bad idea to keep them all as one "release" or not. (This might remove the need for answers to some of the later questions, too.)
* Build # doesn't seem applicable?
Most of the time...no. But it will become relevant if we have a bug in merge automation that we have to fix, and then restart the merges. And on the topic...what happens if we find a bug halfway through (eg: after pushing some changes but not finishing the last the steps)? How do we recover from that?
* I don't know about the version either. I just pointed it at the beta version to get something to render * These behaviors cut across the repository/branch line, where does the version come from?
If we stick with a model where all merges are part of one "release" I strongly encourage not using a version that is tied to any one in particular - that would enevitably lead to confusion.
Overall, I do question whether or not it makes sense to try to fit merge automation into the mold of a "release". It may be cleaner to create a new table or tables that fit its model much better. (I think we could still make use of much of the UI, phases, and other helpers though.)
Thanks for chiming in Ben. I'm inclined to agree that integrating the merge-automation
into our current Release model might complicate things. I like the idea of developing new model(s) designed for the merge-automation
action schema. I plan to delve into this idea further, focusing on how such a model would mesh with our current models (such as Phases), React components, and other helpers.
Yeah I agree this seems like the proper approach.
One suggestion is rather than making a model designed specifically for merge-automation
, why not try to make a model designed to be generic? Who knows what other things we might want to add in the future?
When it comes down to it, there is a routine we need to perform. For each routine we need to specify:
Then for each step, we need to specify:
And that's pretty much it. If we can keep our model as simple as that, it should be enough to support merge automation, and any other action based workflows we can come up with down the road.
Great thinking @ahal - I like it. I will be trying it out for sure. The input could be stored in Postgres' JSON type
Here's a possible schema for generic shipit "workflows" (names are hard 😛)
shipit_api_workflow_status
shipit_api_workflows
id
: Primary key.attributes
: JSON, stores various attributes for a workflow.
This can be used to store things such as build ids, xpi meta-data, revisions, etc.name
: Text - must be unique.status
: Enum type shipit_api_workflow_status
.created
: Timestamp - defaults to the current timestamp.completed
: Timestamp.status
/created
to speed up the front-end.shipit_api_workflow_phases
via the workflow_id
.shipit_api_workflow_phases
id
: Primary key.name
: Text.submitted
: Boolean.task_id
: Text.task
: JSON - stores task details (i.e. the hook_id and hook_payload.)context
: JSON - stores context information (i.e. action input/taskgraph parameters.)created
: Timestamp - defaults to the current timestamp.completed
: Timestamp.completed_by
: Text - indicates who completed the phase.workflow_id
: foreign key referencing shipit_api_workflows(id)
.workflow_id
references shipit_api_workflows(id)
.shipit_api_workflow_phase_signoffs
via the phase_id
.shipit_api_workflow_phase_signoffs
This is exactly the schema we have now for releases, but for "workflows."
Here's what the models might look like:
import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import ENUM
from backend_common.db import db
import datetime
class Workflow(db.Model):
__tablename__ = "shipit_api_workflows"
id = sa.Column(sa.Integer, primary_key=True)
attributes = sa.Column(sa.JSON, nullable=False)
name = sa.Column(sa.String(80), nullable=False, unique=True)
status = sa.Column(ENUM("scheduled", "completed", "aborted", name="shipit_api_workflow_status"), nullable=False)
created = sa.Column(sa.DateTime, nullable=False, default=datetime.datetime.utcnow)
completed = sa.Column(sa.DateTime)
phases = sa.orm.relationship("Phase", order_by=Phase.id, back_populates="workflow")
def __init__(self, attributes, status):
self.name = self.construct_name(attributes)
self.attributes = attributes
self.status = status
def construct_name(self, attributes):
raise NotImplementedError("Subclasses must implement this method to construct the workflow's name")
def phase_signoffs(self, phase):
raise NotImplementedError("Subclasses must implement this method to get signoffs for a phase")
@property
def allow_phase_skipping(self):
# This only affects the frontend, *the API doesn't enforce phase skipping.*
return False
@property
def json(self):
return {
"name": self.name,
"status": self.status,
"attributes": self.attributes,
"created": self.created.isoformat(),
"completed": self.completed.isoformat() if self.completed else "",
"phases": [p.json for p in self.phases],
"allow_phase_skipping": self.allow_phase_skipping,
}
class Phase(db.Model):
__tablename__ = "shipit_api_workflow_phases"
id = sa.Column(sa.Integer, primary_key=True)
name = sa.Column(sa.String, nullable=False)
submitted = sa.Column(sa.Boolean, nullable=False, default=False)
task_id = sa.Column(sa.String, nullable=False)
task = sa.Column(sa.JSON, nullable=False)
context = sa.Column(sa.JSON, nullable=False)
created = sa.Column(sa.DateTime, default=datetime.datetime.utcnow)
completed = sa.Column(sa.DateTime)
completed_by = sa.Column(sa.String)
workflow_id = sa.Column(sa.Integer, sa.ForeignKey("shipit_api_workflows.id"))
workflow = sqlalchemy.orm.relationship("Workflow", back_populates="phases")
signoffs = sqlalchemy.orm.relationship("Signoff", order_by=Signoff.id, back_populates="phase")
class Release(Workflow):
product_details_enabled = True
def __init__(self, attributes, status):
super().__init__(attributes, status)
def construct_name(self, attributes):
return f"{attributes['product'].capitalize()}-{attributes['version']}-build{attributes['build_number']}"
@property
def allow_phase_skipping(self):
# This only affects the frontend, the API doesn't enforce phase skipping.
product = ALLOW_PHASE_SKIPPING.get(self.product, {})
return product.get(self.project, product.get("default", False))
def phase_signoffs(self, phase):
return [
Signoff(uid=slugid.nice(), name=req["name"], description=req["description"], permissions=req["permissions"])
for req in SIGNOFFS.get(self.product, {}).get(phase, [])
]
@property
def project(self):
return self.attributes["branch"].split("/")[-1]
class XPIRelease(Workflow):
product_details_enabled = False
product = "xpi"
def __init__(self, attributes, status):
super().__init__(attributes, status)
def construct_name(self, attributes):
return f"{attributes['xpi_name']}-{attributes['xpi_version']}-build{attributes['build_number']}"
def phase_signoffs(self, phase):
return [
XPISignoff(uid=slugid.nice(), name=req["name"], description=req["description"], permissions=req["permissions"])
for req in SIGNOFFS.get("xpi", {}).get(self.attributes["xpi_type"], {}).get(phase, [])
]
Using a JSON column for storing "workflow attributes" that vary between different "workflow types" is more flexible and simplifies the database schema. It would allow us to add new workflows, or modify existing ones, without needing schema migrations. Some pros and cons of this approach:
pros
We can store any attributes for the workflow within the "attributes" column decreasing the need for altering the database schema and doing migrations.
If new "attributes" are introduced, we can add them to the JSON object being persisted without impacting existing records or requiring database updates.
Development can be faster because adding new features that require storing new "attributes" doesn't require changes to the database schema.
Reduces the complexity of the database schema by eliminating the need for multiple repetitive tables and columns.
cons
Any integrity checks or validation of these "attributes" must be enforced in the application layer. There's no "attributes" schema enforced by the database.
Queries involving data within the JSON in a JSON field can be less efficient.
Indexing JSON fields can be less efficient.
Our other option is to continue using the "'abstract' base class and subclass with table" pattern but introduce a generic workflow type. Our current pattern is more modular: each subclass can has its own database table, which allows for tailored schemas that fit each workflow type. It is also better constrained because the schema is enforced at the database layer. But, each subclass having its own table adds complexity to the database schema. We might have more tables than necessary since the differences between our current workflows (Release
/XPIRelease
) seem minor. Another advantage of using this pattern is it's similarity to the existing setup - that could speed up implementation.
I haven't had a chance to take this in fully yet - but a couple of quick thoughts:
On this point specifically:
If new "attributes" are introduced, we can add them to the JSON object being persisted without impacting existing records or requiring database updates.
I'm not sure I agree this is a pro. Ultimately whatever code is using these records needs to know that older versions of them do not have fields that were added later, so whether it's a field in a json object that may not exist, or a nullable column, the difference in structure must be handled.
I will try to review this in more depth soon, but please don't block on me - I think this is very much on the right track.
Summary
The enhancement of ShipIt through the integration of our "Merge Day Automation" is an opportunity for us to streamline our release process. ShipIt, being a Taskcluster client, already facilitates the triggering of Taskcluster hooks. We should be able to incorporate the merge automation action in a similar way to the release promotion action.
Objectives
Self-Served Merges for Release Managers: ShipIt's sign-off system offers a controlled environment for release managers. By enabling them to self-serve the Firefox train merges through this system, we can decrease the number of merge-related hand-offs, making the release cycle more efficient.
Simplification of Merge Duty: The current incarnation of "Merge Duty" involves release engineering intervention and some coordination among Firefox release teams. Automating this process will simplify these duties, allowing the team to focus on strategic tasks.
Questions
References
b.m.o [meta] bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1890753