zmkfirmware / zmk

ZMK Firmware Repository
https://zmk.dev/
MIT License
2.63k stars 2.69k forks source link

ci(build): distill workflow #502

Open innovaker opened 3 years ago

innovaker commented 3 years ago

I recently increased our ci test coverage to include all boards and shields because we had some blind spots which needed to be covered for verifying #401. However, GitHub Action's random ECONNRESET issue (during the archiving step) is subsequently hitting us harder than before and we're having to restart runs more often.

I've suggested we explore a new strategy: a) PRs and merges trigger testing of a select (small) group of boards/shields as well as build-args/configs, targeting broad feature coverage. b) Nightly builds for all boards/shields as a safety net.

I'm happy to implement this change, but I don't have sufficient insight to design the specifics of (a). We need from all the contributors.

Please use this issue to discuss the approach and suggest ideas, hopefully leading to a design for (a).

okke-formsma commented 3 years ago

I would like to propose the following:

This would cover the most used combinations.

This would require us to define multiple jobs, as there is only one matrix definition per job possible.

Nicell commented 3 years ago
  • build all boards for a small set of shields (proton-c and nice nano for example)
  • build all shields for a small set of boards (kyria and qaz for example)

This seems like a good start. I think we can also keep the current include list since that's quite short. I'd like to see #500 be done for underglow and other future features.

The only potential issues I see for this small coverage case is when a new shield is submitted that works on the nice!nano, but it breaks on other boards due to missing a board specific overlay or enabling a feature that isn't supported. These shields should be able to work without a board specific overlay and shouldn't have bad config options enabled, but it will be harder to catch that error without testing a few boards (or all). See this commit as an example: https://github.com/zmkfirmware/zmk/pull/429/commits/a50fdeca575e7eee91e622b7f6a9dd0a3e40a8cb

It would be neat if we could have a way of building a shield on every supported board for PRs, but I'm thinking GitHub actions may be a bit too limited to do this. This page has some information on labels be used to trigger certain builds, but I'm not sure if we can work with this: https://github.community/t/do-something-if-a-particular-label-is-set/17149

innovaker commented 3 years ago

There's also the option of writing a dedicated script (i.e. Python) which generates a tailored build matrix. Would that be more useful @Nicell?

See: https://github.blog/changelog/2020-04-15-github-actions-new-workflow-features/

mcrosson commented 3 years ago

I'll add a 👍 to the build shield + all boards option ; as I worked on the tidbit bring up I've tripped over some proton_c differences compared to the nice!nano. Nothing too complicated to fix but definitely not caught by local testing as I don't have a proton_c on-hand.

Nicell commented 3 years ago

I think we could do some on magic in the workflow file to only build every shield/board combo when the boards directory has been adjusted. Otherwise for other changes, only build the small matrix as outlined above.

It's not perfect, as ideally it'd only build the changed boards/shields, but that type of logic would probably require scripting, which I'm not sure if we need on a first pass of updating this system.

Thoughts @innovaker?

innovaker commented 3 years ago

Composition/reuse within GitHub workflows will become easier over time as the GitHub development team is working on it. Moreover, I'm fairly sure that the modular approach will soon be considered the best approach for shields. So for those reasons, I'm not bothered by the repetition of workflow code for the short-term as long as we're mindful of it.

@Nicell, yes, I was also thinking of us leveraging on and I'm keen to fully understand the critical paths. If we're relegating our current blanket approach, I'd like to consider our tactical options before settling on another (albeit cut-down) blanket approach. Given that the current matrix is mostly promicro-orientated shields, what benefits do we get running every shield on every push with the default settings? Can/should we be more selective? I'd appreciate your insight on that please to save me studying every shield in detail.

innovaker commented 3 years ago

Composition/reuse within GitHub workflows will become easier over time as the GitHub development team is working on it. Moreover, I'm fairly sure that the modular approach will soon be considered the best approach for shields. So for those reasons, I'm not bothered by the repetition of workflow code for the short-term as long as we're mindful of it.

Adding to my previous comment (above): in parallel to this issue, I'm intending to prototype wrapping up the core of the build job (west commands and variables) with a composite action because it also has strong potential for modules. Hopefully the checkout and archiving steps too once uses is also supported. This should minimize any code reuse/duplication issues which frees our hands somewhat.

petejohanson commented 3 years ago

So, for a minimum basic "coverage", I would see.

That seems like our baseline "cover most edge cases" set.

mcrosson commented 3 years ago

So, for a minimum basic "coverage", I would see.

* One non-split w/ display and encoder enabled (Romac+?) with both nRF52 (`nice_nano`) and stm32 (`proton_c`) based pro-micro compat boards.

If / when the Tidbit is merged I am willing to volunteer to keep tabs on build failures. Tidbit is similar to the Romac+ but with more features/flexibility present making it a little more complex of a test case day to day.

petejohanson commented 3 years ago

Once we have tidbit in, we could move to that. Until then I would say:

innovaker commented 3 years ago

@Nicell has kindly offered to take on the implementation of this issue. Thanks!

innovaker commented 3 years ago

I'm tracking this issue related to the ECONNRESET: https://github.com/actions/upload-artifact/issues/116

caksoylar commented 2 years ago

Related: #889 (not sure if this is addressed by it sufficiently?)