kubernetes-sigs / cluster-api-provider-azure

Cluster API implementation for Microsoft Azure
https://capz.sigs.k8s.io/
Apache License 2.0
292 stars 419 forks source link

Improve code coverage metrics and visibility in CAPZ #3229

Closed willie-yao closed 1 year ago

willie-yao commented 1 year ago

/kind cleanup

Detailed Description

This is a larger "Epic" issue that tracks overall efforts to improve and increase the current code coverage in CAPZ. The current state of code coverage can be viewed on the codecov dashboard. The current average unit tests coverage sits at about 40%.

Goals & Desired Outcomes

  1. Identify any gaps in current unit tests and code coverage across packages.
  2. Prioritize testing areas of code with the largest code reach.
  3. Improve visibility of code coverage metrics for developers.
  4. Add unit tests and increase code coverage percentage in high-impact areas such as the controllers package.
  5. Achieve consistency in code coverage reports/results.

UPDATE This issue will be closed once all overall tasks are completed. The goal has shifted from increasing code coverage overall to improving the metrics and visibility of code coverage across the project. Efforts to increase code coverage for specific packages will be moved to its own separate issue to decrease the scope of this issue.

### Overall Tasks
- [x] #3258
    - ~~Note: Currently the coverage reports are generated through Go's internal `go tool cover`, which does not support excluding files or directories from coverage reports. Will be looking into alternative solutions.~~
- [ ] #3312
- [ ] #3427
### Webhooks Needing Unit Tests
- [ ] azureclustertemplate_default.go
- [ ] #3371
- [ ] azureimage_validation.go
- [ ] #3372
- [ ] #3374
- [ ] #3373
- [ ] #3429
- [ ] #3430
- [ ] #3431 
- [ ] azuremanagedmachinepool_webhook.go
  - ValidateUpdate()
  - ValidateDelete()
  - validateLinuxOSConfig()
- [ ] #3432 
- [ ] azuremachinepool_webhook.go
  - ValidateSystemAssignedIdentityRole()
  - ValidateOrchestrationMode()
- [ ] azuremachinepoolmachine_webhook.go
### Controllers Needing Unit Tests
- [ ] https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/3360
- [ ] azurecluster_controller.go
- [ ] azurecluster_reconciler.go
- [ ] azureidentity_controller.go
- [ ] azurejson_machine_controller.go
- [ ] azurejson_machinepool_controller.go
- [ ] azurejson_machinetemplate_controller.go
- [ ] azuremanagedcluster_controller.go
- [ ] azuremanagedcontrolplane_controller.go
- [ ] azuremanagedcontrolplane_reconciler.go
- [ ] azuremanagedmachinepool_controller.go
- [ ] azuremanagedmachinepool_reconciler.go
### Lower Priority Tasks
- [ ] #3290
- [x] #3291
willie-yao commented 1 year ago

/assign

nojnhuh commented 1 year ago

My 2c on how I think we should go about this effort to make the most of it:

That being said, I'd propose we do the following:

@willie-yao @CecileRobertMichon Please let me know if any of that is off-base, just looking to level-set before we get too deep in this effort.

dtzar commented 1 year ago

IMO one of the important things with code coverage is to set a % overall of coverage baseline of where you are today and have CI enforcement to keep things at that % level. This way you're not allowing regression of the quality of code for new contributions.

willie-yao commented 1 year ago

After taking another pass through the codebase, it seems like there are several functions and important edge cases not being tested in many of the webhooks despite having a high code coverage %. I think ensuring that the webhooks are working as intended is a high priority.

Will also take a closer look at the controllers package, which has overall lower code coverage but requires extensive use of mocks to test properly.

CecileRobertMichon commented 1 year ago

Wonder if we could use something like https://github.com/go-gremlins/gremlins to measure the effectiveness of tests to, not just coverage.

willie-yao commented 1 year ago

After taking a closer look at the codecov dashboard, I've listed a few findings about the overall code coverage of the project:

  1. Some statistics of the codecov dashboard are misleading. Many files report less than 60% code coverage, but are only missing tests for trivial functions like getters and setters. An example would be the api type files
  2. Many files that should not be included in the dashboard are there, such as older api versions and conversion files. These will be removed by #3259
  3. As pointed out by @nojnhuh, most files within the azure package will be replaced due to the adoption of ASO. Thus, we should focus our efforts to other packages.

Because of this, I've narrowed down the effort into two areas of the highest priority to be tested:

  1. Webhooks: Some webhook files report high percentage of code coverage but are missing coverage for key areas that could lead to api errors if left untested. An example of this would be the AzureMachine webhook
  2. Controllers: Most controller files have less than 40% code coverage. This is a critical part of the codebase that should be tested more.

Going forward, I would like to focus our efforts on one of the two areas first. It would be easier to start with the webhooks as most of the test cases will be trivial. While the controller tests will be harder to implement due to extensive mocking, it has much less code coverage and is a critical part of the codebase. @CecileRobertMichon @nojnhuh @mboersma @dtzar and any others have any thoughts on this?

Edit: Will also use tools to calculate code churn in order to identify which files are used most often. This can be another metric to decide which files we should add tests to first.

nojnhuh commented 1 year ago
  1. As pointed out by @nojnhuh, most files within the azure package will be replaced due to the adoption of ASO. Thus, we should focus our efforts to other packages.

@CecileRobertMichon made a good point in office hours last week about the value in having a solid set of baseline tests in azure/ before the ASO migration to catch regressions. I'm not sure that means that's all necessarily high priority, but I think we can at least keep the ASO migration from being a deterrent to adding test coverage there. It's probably wise to make auditing and improving coverage in an existing implementation one of the first steps in migrating a service over to ASO.

willie-yao commented 1 year ago

Closed as all overall tasks have been completed. Sub-issues will be created to track unit test progress in each package.

webhooks: #3648 controllers: #3649