Open lacasseio opened 2 years ago
An additional problem not mentioned is tests requiring multiple tools especially if they need a wide range. The solution would be to model all tools together as part of a pseudo-installation. As the permutation goes up, it's always going to get messy. We just need to manage the messiness by avoiding narrow-minded solutions.
It seems there is a clear distinction between tools/versions under test and the testing strategies. We should model both separately so we can reuse the tools/versions under test to assert coverage at the end of the CI pipeline. Also, tools/versions may map to overlapping testing strategies for different contexts/scenarios. One such example is the coverage context where for development we will want something easy like partial
(or latest available versions) or all
(or all available versions) or default
(or first available versions). For CI, we may want to have xcode12.4
(all test that needs exactly Xcode 12.4) or allXcode
(all test that needs any Xcode versions) or something similar.
We did a bit more thinking here. The initial write-up is fantastic! Good job past Daniel! The point that should be cleared up is the distinction between the requirements vs coverage. The requirements will decide if the test can execute in the current environment. Ex: a test requires a C-compatible toolchain or requires any major OS. The coverage dictates what variants are "good enough" for executing this test. Ex: a major toolchain (any of GCC, Clang or MSVC) or every latest GA major toolchain (GCC 12.2, Clang 15.0.1, MSVC 2022).
It's important to note that requirements imply some coverages. Ex: requiring Gradle 7.5 and up would imply Gradle 7.5, 7.5.1 and nightly. The coverage declaration would make the coverage explicit.
Using coverage context, we can select subgroup of the coverage selection: latest
, latest-available
, all
, partial
, default
, etc. We could see an uber-all
context that would select every possible coverage. In general, the coverage should be selected from the explicit coverage.
Some coverage declarations could be more lenient in the selection. Ex: requires a C99 compatible toolchain and coverage any one toolchain, then GCC 9 would be the only test variant and be as valid as Clang 10 or MSVC 2019.
Some coverage/requirements may imply some selection by default. For example, MSVC would imply OS coverage on Windows. While strictly GCC or Clang would imply Linux out of the most cost-effective machine to use.
When asserting the coverage, we would consider all this information and cross reference test execution between all CI jobs. For coverage that specifically dictates multiple OS execution, if the test were skipped on Windows, we would deem the test coverage a failure. However, if we would specify any OS that implies Linux by default, then skipping on Windows or Mac would be correct if it wasn't skipped on Linux.
To recap, there is a distinction to be made between requirements and coverage. The coverage would also fuel the parameterized test.
This issue is a brain dump in relation to improving testing coverage for Nokee. I will try my best to untangle everything. See https://github.com/nokeedev/gradle-native/issues/514, https://github.com/nokeedev/gradle-native/issues/513 and https://github.com/nokeedev/gradle-native/issues/526 for some short term goals.
Problem Space
To start, we want to avoid blindly spreading all tests across all permutations of tools and their supported versions. It's counter-productive and doesn't address the core issue which is focusing on what we are trying to verify. Some verification only needs a tool (regardless of their versions, vendor, etc.) that fulfill some requirements, i.e. can compile Swift 3 source files or can compile C sources, or an Xcode installation. We can be as abstract or as precise as we need, i.e. Clang 13.0.2 or MSVC 2022 with component C++, MSBuild, etc.. The idea is to declare what we need for the verification to be successful.
In some cases, we need to execute the same test against a wider range of versions (and possibly tools). For example, testing that we can detect GCC, assuming we have GCC 4 to 11 available, a full test would need to check against GCC 4, 5, 6, 7, 8, 9, 10, 11. However, a quick test may only check against the latest, e.g. GCC 11. We may also want to configure what "quick" means under certain scenarios. With our GCC example, we may want to include GCC 4 as well given the tool is old and may have special handling in the production code.
Tools are also subject to availability. We may have to run on older machines to access older tools where newer tools may not be available. Some tools are also only available on specific operating systems such as MSVC and MinGW are only available on Windows machines while Xcode is only available on macOS while
swiftc
is provided by Xcode on macOS but as a standalone toolchain on Linux. In those cases, it's easy to unintentionally skip some tools resulting in a lack of coverage without clear signals.During development, we need sensible defaults so we can quickly and efficiently develop the code while being able to force certain tools to be included in our tests. Here we are constraint by Gradle and its integration with the IDE. We want to avoid as much as possible behaviour hacking on Gradle tasks, e.g. folding multiple behaviours in the same tasks but we also want to avoid generating every possible behaviour permutation as Gradle tasks. Regardless of how the test tasks are split/configured, the what should always be obvious and straightforward, e.g. the metadata attached to the test tasks. This metadata will be the key to composing our CI pipeline.
Current Situation
We currently use an outdated Spock 1.x extension for all of our functional tests. The Spock extension was picked up from Gradle code base which is now using Spock 2.x. As a whole, we want to remove our dependency on Spock meaning a Spock solution is not possible. Our reason to move away from Spock is mostly due to our unfortunately bad experience with Spock/Groovy as a testing framework/language.
The Spock extension duplicates test cases based on their coverage context which supports
all
,partial
,default
or specific versions. The extension will detect available versions and execute its selection according to those candidates. A tool's availability is never asserted allowing for "unintentional skip" leading to lack of coverage. Despite that, we currently only use thedefault
coverage context meaning we always test against the first available tool. The extension also allows some support to specify tool requirements.One considerable downside to the Spock extension is all the discovery of toolchains is done by the extensions. The problem lies with the separation of concerns where the build system is responsible to make decisions based on the environment but the extension makes its own decision as well regarding what it sees from the environment. There is a conflict of responsibility. Ideally, the data flow should be in one direction and the extension should focus on the test execution based on the data provided by the build system.
Proposed Solution
The proposed solution is a combination of multiple pieces that collaborate together:
testingStrategies
. We have to carefully declare our testing variants to avoid generating an unmanageable number of additional test tasks.macos-10.15
vs macos-11`. Our local machine may also be different. When developing, we may not care about the exact tool versions available. However, on CI we care that our code is tested against the exact tools and versions.ubuntu-latest
will select all OS-agnostic tests as well as Linux-specific tests and onwindows-latest
we will select only Windows-specific tests. The behaviour is different than local execution of the same task where OS-agnostic tests are always selected. The version coverage is also different, a quick test onmacos-10.15
would test against Xcode 12.4 but onmacos-11
we would test against Xcode 13.2.1. A full test would select all available versions that we care about (sometimes selecting a representative subset of all available versions is good enough).OS Strategies
Some tests require specific OS environment while others can run on any OS (OS-agnostic). Typically, users would simply annotate their tests with
@EnabledOnOs
and co. Then users would execute the test on its respective OS to get full coverage. In a utopic world, the build system would be able to spawn the right environment for the test locally and on CI. For this, we would need multiple tasks (one for each scenario). However, more tasks can become a bit harder when it comes to calling the right one especially in IntelliJ with the quick test run button. There are three different solutions: 1) fold all OS test variants into a single variant or 2) disable unnecessary test variants to hide them from IntelliJ during sync or 3) mix of all both solutions. The reason for folding all OS test variants into a single variant is simply guided by the fact that during development, we usually want to run all agnostic and current OS tests. Creating multiple variants opens the possibility to distribute the test onto other OS from a single machine. Regardless of the choice here, there should be no impact on all the other pieces of the solution (for CI, reporting, assertion, etc.).Multi-tool and multi-version Strategies
Computing every permutation of tool and version as test variants may be overkill. The Gradle codebase uses coverage context which is then controlled by system properties. We aren't a big fan of this approach simply because it requires a bit of gymnastics to configure and run locally. We feel there is a good middle ground between coverage context and exact tool/version. Just like the OS strategies, regardless of how we do it, the metadata should be the same removing any impact of the other pieces of the solution.