singularity-energy / open-grid-emissions

Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
MIT License
67 stars 4 forks source link

Set up continuous integration for the Open Grid Emissions data pipeline #185

Open miloknowles opened 1 year ago

miloknowles commented 1 year ago

For each new PR, it would be nice to automatically check that the full data pipeline runs without errors. This would be especially important once we have more open-source contributors, since we could require that the tests pass before merging. We could also do a nightly run that would help us catch breaking changes to input data (e.g EIA suddenly renaming a column).

@Rdbaker @burkaman @wendellwilson were discussing continuous integration the other day, and this project might be a good place to start. I've only worked with CircleCI before, so maybe the rest of the team will have some thoughts on what the best tool for this is?

burkaman commented 1 year ago

I've also used CircleCI and liked it, but I think we also want to evaluate Github Actions (we have a lot of credits already included with our account) and AWS CodePipeline (which might be easier to integrate with other AWS services). I made a couple tickets, I was going to try to work on it next week but anyone who has time is welcome to go for it.

burkaman commented 1 year ago

Actually this is going to be a public repo, right? Do we need CI for this to be publicly accessible?

grgmiller commented 1 year ago

Great idea. PUDL uses CI for their repo: https://catalystcoop-pudl.readthedocs.io/en/latest/dev/testing.html?highlight=continuous%20integration#continuous-integration-tests

Catalyst actually published a template that we could consider using for setting this up: https://github.com/catalyst-cooperative/cheshire

Separate issue, but I think we should also set up some pre-commit hooks to enforce formatting and other things.