owid / etl

A compute graph for loading and transforming OWID's data
https://docs.owid.io/projects/etl
MIT License
77 stars 21 forks source link

Automate PR creation when updating steps #2999

Open Marigold opened 2 months ago

Marigold commented 2 months ago

Our guide for updating data suggests creating reference and review branches as best practice. We already use etl d draft-pr command which has proven to be handy for automatic PR creation. We could do similar automation for updating data:

  1. Create reference branch
  2. Update steps (by using functions from ETL dashboard)
  3. Commit new steps and push reference branch to Github
  4. Create a review branch with PR

@lucasrodes suggested

Maybe, we want to add that when updating the steps in ETL Dashboard? E.g a checkbox or something ‘create reference and review branches’. Title and name of the branch could be dynamically generated based on the datasets selected for update (edited)

@pabloarosado suggested

I don't think all that can easily be done automatically (because step 2. can get a bit tricky). But maybe there's a better way around it.

It'd make sense to at least create reference & review branches automatically, and then see whether it's worth automating the rest. Leveraging etl d draft-pr could make it relatively easy.

lucasrodes commented 2 months ago

One minor thing: I'd rename the command to something shorter, so it is easier to adopt by the team.

From the top of my head, I'm thinking etl pr, but I'm open to suggestions.

larsyencken commented 2 months ago

We discussed that you can suggest to review a subset of commits, to make the review more relevant.

pabloarosado commented 2 months ago

We discussed that you can suggest to review a subset of commits, to make the review more relevant.

Yes, Lars suggested an alternative to creating a reference branch and a subbranch. You can simply create a PR that attempts to merge a branch to master. Then, on the upper left corner, you can choose which commits to consider when reviewing. You can skip the commit(s) that duplicated old code. Once you select the ones you want to consider, we can copy that URL and add it to the description of the PR, to point the reviewer to a more convenient view of the changes.

Screenshot 2024-07-25 at 12 06 08

I haven't tried this workflow yet, but it looks promising. If it works well, we can avoid creating a reference branch (and of course it's corresponding staging server). I'll try it out and update the docs accordingly.

stale[bot] commented 20 hours ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.