Open jhchabran opened 8 months ago
Date: 2024-02-14
π’ On Track
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-02-15
π’ On Track
N/A
N/A
Created by jean-hadrien.chabran@sourcegraph.com
Date: 2024-02-28
π’ On Track
N/A
N/A
Created by jean-hadrien.chabran@sourcegraph.com
Date: 2024-02-28
π’ On Track
N/A
N/A
Created by jean-hadrien.chabran@sourcegraph.com
Date: 2024-03-15
π’ On Track
Current: Working on writing up plan
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-03-15
π’ On Track
Current: Working on writing up plan
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-03-19
π’ On Track
Current: Plan completed, creating issue tracker tasks & determining order in order to parallelize work efficiently across the team
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-03-19
π’ On Track
Current: Plan completed, creating issue tracker tasks & determining order in order to parallelize work efficiently across the team
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-03-29
π’ On Track
Current: 45% of the spreadsheet has been filled in, soft deadline was set for Wednesday 3rd April
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-03-29
π’ On Track
Current: First planned approach to triggering build metrics collection came up to a dead-end. An alternative approach reusing build-tracker service is in-progress. Currently modernizing its deployment to utilize MSP, bringing it in-line with future direction of deploying hosted services
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-04-08
π’ On Track
Current: Triggering an async pipeline on build completion is working and live on MSP. Original build-tracker is still running while we observe the new MSP deployed one
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-04-15
π’ On Track
Current: 65% of the sheet is filled in after I added some more directories and ownership. More time was given after some feedback on the deadline being too short. Final ping to EMs planned to go out tomorrow
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-04-23
π’ On Track
Current: We are now shipping Buildkite specific data to BigQuery. Bazel data is currently in-progress
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-04-15
π’ On Track
Current: 65% of the sheet is filled in after I added some more directories and ownership. More time was given after some feedback on the deadline being too short. Final ping to EMs planned to go out tomorrow
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-04-29
π’ On Track
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-05-06
π’ On Track
Current: Have begun experimenting with dashboards in Redash and fixing up data issues that arise
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-05-08
π’ On Track
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-05-14
π’ On Track
Current: Reached ~70.5% after examining the remaining tests and excluding irrelevant ones (e.g. diffs for generated files/copies etc). PR is being prepared
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-05-16
π Completed
Current: PR is merged, so this OKR is technically complete. Follow-up work involves a CI check to maintain that level, as well as splitting out certain mega-packages into more distinct owners to reach a higher level
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-05-06
π’ On Track
Current: Have begun experimenting with dashboards in Redash and fixing up data issues that arise
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-05-19
π’ On Track
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-05-19
π’ On Track
N/A
N/A
Created by noah@sourcegraph.com
Date: 2024-06-02
π’ On Track
N/A
N/A
Created by noah@sourcegraph.com
CI Observability
Improving and fostering ownership & accountability of CI build & test performance in the monorepo, improving the reliability and speed of CI.
Problem
sg/sg is a complex product. Testing and build times are a key element in controlling the quality, yet teams have very little visibility into where they stand, or how their code affects other teams in CI, fostering learned self-helplessness.
Buildkiteβs βTest Analyticsβ feature covers a lot of the same ground, but doesnβt include enough details around things such as the critical path, more customizable graphing capabilities, as well as aggregations besides P50 and unlocking future capabilities for introspecting what causes a cache miss etc.
From a cost measuring perspective, this has previously been a mostly manual process of extracting data from the Buildkite API and attempting to correlate cost with this data via google spreadsheets. On top of being a manual process, this misses out on the actual cost part of CI, which is the underlying infrastructure.
Success criteria
Proposal
Ownership:
Provide a set of (manually updated) Bazel variables to be used as tags in tests to denote ownership (as a stretch goal, we can explore keeping this set automatically updated, depending on the reliability of the sources of truth we have available to us). CI will enforce that the percentage of tests with a tagged owner remains above 70% (possible through simple and fast bazel query commands), so that new tests added have clear ownership defined.
Observability:
Investigate the different sources of Bazel execution data (build event protocol, compact execution log & profile data) to see what combination of these we will need in order to extract the required information from Bazel. These will be stored as buildkite artifacts where they can be queried by the finalization task. Augment our GCP agent images to log a datapoint when they boot & shutdown, including relevant data from the metadata API server.
Alongside data from buildkite pipeline executions, test tagging & GCP details, we should be able to get answers to the following questions:
Together, they will provide both the basis for providing global and team-specific reports:
Milestones
Risks
Tracked issues
@unassigned
Completed
@Strum355
Completed
@jamesmcnamara
Completed