keyboardio / Kaleidoscope

Firmware for Keyboardio keyboards and other keyboards with AVR or ARM MCUs.
http://keyboard.io
GNU General Public License v3.0
757 stars 259 forks source link

Adding simulation based CI testing - some ideas #727

Open noseglasses opened 4 years ago

noseglasses commented 4 years ago

I regard simulated testing as highly useful - actually a must-have. It can efficiently catch subtle errors in the keyboard's I/O behavior that otherwise would require manual testing on all supported keyboards - a tedious and possibly error prone task.

With simulation based testing, large changes to vital parts of the firmware can be expected to be less painful and also less risky. It will definitely speed up our development cycles.

Current state of development

I recently submitted two PRs (https://github.com/keyboardio/Kaleidoscope-Bundle-Keyboardio/pull/18 and https://github.com/keyboardio/Kaleidoscope/pull/650) that are meant to enable virtual builds for all keyboards that are currently supported by stock Kaleidoscope.

Together with some external pieces of software that I wrote, they possibly pave the way to adding Travis CI testing of virtual firmware runs.

A starting point

I guess that we all agree that those tests are meant to be as simple to maintain as possible and to run as fast as possible. What I have in mind is to generate a bunch of standard test scenarios for every keyboard that define which keys or key chords are to be hit in which order. Test data is then recorded and the tests can be reproduced by the simulator during CI testing. No programming of tests sets is thus required and changing of test scenarios is comparably cheap.

Testing procedure

In the following I describe the workflow that is necessary to add CI testing. All the mentioned software projects could possibly become part of the Kaleidoscope software family.

Recording set values

The Aglais library is meant to write and read data sets that represent the exact timing of the I/O behavior of a USB keyboard (as a blackbox). All keystrokes and resulting USB HID reports (keyboard/moused/...) are recorded. Recorded data is bound to cycle timestamps, in order for it to be used later on to precisely verify the timing of the simulated keyboard. Verifying timing is especially important some firmware features come with timeouts that severely affect their behavior.

I wrote a recorder plugin that interfaces Aglais and collects data during set data generation runs and passes it via the serial interface to the host where it is redirected into code-files that are both human- and simulator-readable. During data recording this plugin is meant to be used by the same firmware configuration (sketch) that will later on be used during CI testing.

Simulated testing

The recorded Aglais files represent set-value data that passed to the firmware simulator during CI testing. During simulation runs a simulator plugin takes over the task to read Aglais files and convert it to key actions (input) of the virtual firmware. It also checks its output in terms of virtual USB HID reports that are verified. The simulator plugin wraps an external library Papilio that provides a dedicated testing API that also allows for individual programmatical testing of most firmware features.

Example applications

The simulator plugin comes with some example applications that demonstrate Papilio's testing API. Also provided are some examples that demonstrate testing of additional features of the virtual keyboard, like e.g. LED states.

How to proceed

I would love to see a first simulator based CI test for at least one keyboard, e.g. the Model01, being added ASAP (rather as soon as the necessary PRs have been approved and merged). A single single test on a single virtual keyboard (e.g. the Model01) will already be worth a lot.

Later we can gradually add more simulated tests for other keyboards and make existing test scenarios more complex.

The first test that I have in mind would use as many typing related plugins as they do fit into the firmware. The more complex plugins are (e.g. like Qukeys), the more complex the recorded interaction must be.

I imagine a test scenario as a list of key and key chord actions together with a description of their approximate timing.

The nice thing about the testing-with-recorded-data approach is that misstyping during recording does not hurt at all. Say two keys are meant to be hit in the correct order and one is a Qukey or a tap dance key. There's nothing wrong with getting the timing wrong. If that happens, the virtual keyboard should generate the exact same wrong output as the physical keyboard.

To make test data generation a bit more reliable and less stressing, every key sequence should be required to be issued several times to make sure that the desired behavior is captured at least once. Moreover, as users tend to type differently (smooth or in bursts) and timing can matter a lot, it might even be beneficial to let multiple people record data for the same test secenario and later use them all during CI testing.

We can afford to use a large amout of recorded test data for CI testing. This because virtual test runtimes are almost negligible - at least compared to compile and link times. The simulated firmware runs lightning fast on a x86 and even though it simulates the physical keyboard's timing, the simulator does time stepping and, thus, simulated time elapses even much faster. Because of that we can afford to run as many test sets as we want, as long as the virtual firmware for a simulated keyboard is build only once.

obra commented 4 years ago

I'm really looking forward to this.

Longer term, I'd love to see our test cases stored as text files rather than as source code, but that's absolutely not a blocker to making this go.

noseglasses commented 4 years ago

Longer term, I'd love to see our test cases stored as text files rather than as source code

With test cases do you refer to the test specs or the recorded data?

The specs must definitely be human readable and should also explain why a sequence of input is tested.

The recorded data is already almost text files (with quotes to allow it to be directly included by C++ code). A compressed version would be an alternative for very large data sets only. But I am not even sure if that will be necessary at all. I remember that we discussed that before.

obra commented 4 years ago

On Fri, Nov 22, 2019 at 1:44 AM noseglasses notifications@github.com wrote:

Longer term, I'd love to see our test cases stored as text files rather than as source code

With test cases do you refer to the test specs or the recorded data?

The specs must definitely be human readable and should also explain why a sequence of input is tested.

Yup. Historically, I've built things like this as a DSL or something like expect. But most of my happier experiences are with things like Perl's Test::More: https://perldoc.perl.org/Test/More.html (I'm not proposing we use Perl here.)

The big wins are that test cases are really easy to write in an interpreted language with simple syntax. It's meant that it's relatively easy for folks to contribute tests.

The recorded data is already almost text files https://github.com/CapeLeidokos/Kaleidoscope-Simulator/blob/master/examples/aglais/IO_protocoll.agl (with quotes to allow it to be directly included by C++ code). A compressed version would be an alternative for very large data sets only. But I am not even sure if that will be necessary at all. I remember that we discussed that before.

Yeah. I suspect you're right that we're not going to need to compress the recorded data. I -think- we will want to end up expanding the recorded data into something in the same format as the spec tests. Eventually. Maybe. I've gotten a lot of value out of expanding and annotating recorded tests into fuller test suites over the years.

But all of this can be done iteratively, once we have that first test landed.

You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/keyboardio/Kaleidoscope/issues/727?email_source=notifications&email_token=AAALC2FNKTSU5O7OSHJFW6DQU6SXXA5CNFSM4JQCGF42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE5DXSQ#issuecomment-557464522, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALC2GS6TPZFWIOHD2MEL3QU6SXXANCNFSM4JQCGF4Q .

noseglasses commented 4 years ago

Test::More appears to be a tool for programmatically defining tests.

To make sure that we're not talking about different things: I propose to start with integration testing the firmware as a whole. I imagine the description of those tests I am talking about to be just text in any format (not necessarily machine readable) that explains the features that are meant to be tested an that provides a list of keyboard operations that a human must carry out in order to generate the data sets that are then fed to the simulator. That can e.g. something like: "Randomly type 100 characters" or "hit all keys on the keyboard at least once" or "hit key A, then key A, chord C+D, ...". The output of such preparation runs on the real hardware is going to be machine and human readable (like the aglais files I linked, whose format is open for discussion or something similar).

Programmatically defining tests will be required to test specific corners of the firmware, which is closer to unit testing but still using the whole firmware. Such test have the drawback that they are harder to maintain while the firmware evolves. And that's why I came up with the recorded-data based approach which I expect to be more efficient for our needs. That's what I intented Papilio's API for.

Anyway, I suggest to drive all our tests (of whichever type) with C++ and keep it inside Arduinos build system. That way, things stay very compact. The proposed Kaleidoscope-Simulator plugin comes with some examples that demonstrate how this could work. Test execution is just an additional Arduino build step that, if it fails, makes the whole build fail and inform Travis that something went wrong. In fact, adding this to the smoke examples tests, just to test the different devices shouldn't be too complicated. We would just need to toggle test execution based on the environment variable ARCH.

obra commented 4 years ago

The thing about Test::More and things like it that I find particularly valuable is that it's a simple, lightweight imperative DSL for testing, where additional tests can be enabled by dropping test files into a directory. The simplest test files are just a couple lines. It can be used for unit testing and end to end testing, depending on APIs.

I've never seen testing built into Arduino's build system. That doesn't mean it's impossible, of course, but it doesn't give us a nice path to follow.

But again, this is more about the properties of what I'd like to end up with than where we need to be to make it valuable to merge. As I get familiar with the stuff you've built, I'm sure I'll have more useful feedback. ᐧ

On Sat, Nov 23, 2019 at 12:37 AM noseglasses notifications@github.com wrote:

Test::More appears to be a tool for programmatically defining tests.

To make sure that we're not talking about different things: I propose to start with integration testing the firmware as a whole. I imagine the description of those tests I am talking about to be just text in any format (not necessarily machine readable) that explains the features that are meant to be tested an that provides a list of keyboard operations that a human must carry out in order to generate the data sets that are then fed to the simulator. That can e.g. something like: "Randomly type 100 characters" or "hit all keys on the keyboard at least once" or "hit key A, then key A, chord C+D, ...". The output of such preparation runs on the real hardware is going to be machine and human readable (like the aglais files I linked, whose format is open for discussion or something similar).

Programmatically defining tests will be required to test specific corners of the firmware, which is closer to unit testing but still using the whole firmware. Such test have the drawback that they are harder to maintain while the firmware evolves. And that's why I came up with the recorded-data based approach which I expect to be more efficient for our needs. That's what I intented Papilio's API for.

Anyway, I suggest to drive all our tests (of whichever type) with C++ and keep it inside Arduinos build system. That way, things stay very compact. The proposed Kaleidoscope-Simulator plugin comes with some examples that demonstrate how this could work. Test execution is just an additional Arduino build step that, if it fails, makes the whole build fail and inform Travis that something went wrong. In fact, adding this to the smoke examples tests, just to test the different devices shouldn't be too complicated. We would just need to toggle test execution based on the environment variable ARCH.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/keyboardio/Kaleidoscope/issues/727?email_source=notifications&email_token=AAALC2GETSMKGN6P3SIUF23QVDTTPA5CNFSM4JQCGF42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE7QPYA#issuecomment-557778912, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALC2DSEXNXQUQ45LYV5ILQVDTTPANCNFSM4JQCGF4Q .

noseglasses commented 4 years ago

This issue is linked to https://github.com/keyboardio/Kaleidoscope/issues/657.

noseglasses commented 4 years ago

Now as everything on the Kaleidoscope side is ready for simulation based testing, I am pondering on how to possibly add a first basic virtual compile and run-test to travis.

There are several points that need to be addressed to make this working.

Libraries

Currently, there are three libraries that work together to enable tests based on recorded typing input.

Kaleidoscope-Simulator drives the virtual firmware and hooks into Kaleidoscopes HID report observer infrastructure. Papilio is the firmware simulation API and Aglais is responsible for recording and reading data to/from text files that also sort of drive the simulation.

Would it make sense to add a wrapper git-repo that contains those three as submodules? This would simplify cloning them in the testing environment.

Build system integration

To setup the build system for virtual testing, I typically copy the three aforementioned libraries in the libraries folder along with the rest of Kaleidoscope's libraries.

My current test description contains of a test sketch that is used for all sort of testing and for each individual test

And there is a .kaleidoscope-builder.conf file that contains ARCH=virtual.

i guess we could have a directory structure test or testing parallel to Kaleidoscope's examples directory where such stuff could be stored?

Running tests

To make things as simple as possible, currently the virtual build system runs the compiled virtual firmware as a build step. This makes build and test execution one command. In the examples folder of Kaleidoscope-Simulator, e.g., the command make aglais builds and runs one of the example simulations.

This step would also need to be integrated in Kaleidoscope's travis testing setup at an appropriate place.

Test generation

I would start with a very simple test that just records some typing with the stock firmware. That will be enough to check the basic features. But that's the most simple part of testing integration. Simulation input can be recorded using the simulator-recorder plugin.

Ideas, discussion, help appreciated.

obra commented 4 years ago

Getting the libraries in place

My first instinct is that:

@noseglasses - What do you think?

Build system integration

For the first step, let's give ourselves a makefile target for running our automated simulator tests. It's pretty straightforward to add something like that to Travis later and a little bit harder to take something intended for travis to run and to run it locally instead

Test generation

Indeed, I think the right first test is a recorded "Hello world!" test. :)

noseglasses commented 4 years ago

@obra, that sounds like a decent plan.

Kaleidoscope-Simulator should be merged into the core repo.

AFAIK, Arduino will build all the .cpp files it can find in every library that is inclluded in the sketch. Although very nice for beginners, that's a very annoying feature of the Arduino build system. Because that means that all files in Kaleidoscope-Simulator need to be guarded with more of those ugly #ifdef KALEIDOSCOPE_VIRTUAL_BUILDs if we move them to the core repo.

...submodules as we've found that end-user-developers really don't like working with them

I hate them, too - or at least find them very confusing. Snapshoting Aglais and Papilio is a good idea but we will have the same problems as mentioned above if we put them under Kaleidoscope. For those two repos/directory trees we really don't want #ifdefs. What if we snapshot Aglais and Papilio in the library directory of the bundle?

I will have a go with the make target and record a "Hello world!".

obra commented 4 years ago

I'm ok with snapshotting Aglais and Papilio into the bundle. Eventually, I think "library" is the wrong name, but let's get it working and then we can clean up the naming once we see what the shape of things is.

(I think I want to rename #KALEIDOSCOPE_VIRTUAL_BUILD to

KALEIDOSCOPE_SIMULATOR. But I'm happy to do the work to do the rename

myself. But I don't want to do that while you're in the middle of making things work.) ᐧ

On Sat, Dec 7, 2019 at 12:37 AM noseglasses notifications@github.com wrote:

@obra https://github.com/obra, that sounds like a decent plan.

Kaleidoscope-Simulator should be merged into the core repo.

AFAIK, Arduino will build all the .cpp files it can find in every library that is inclluded in the sketch. Although very nice for beginners, that's a very annoying feature of the Arduino build system. Because that means that all files in Kaleidoscope-Simulator need to be guarded with more of those ugly #ifdef KALEIDOSCOPE_VIRTUAL_BUILDs if we move them to the core repo.

...submodules as we've found that end-user-developers really don't like working with them

I hate them, too - or at least find them very confusing. Snapshoting Aglais and Papilio is a good idea but we will have the same problems as mentioned above if we put them under Kaleidoscope. For those two repos/directory trees we really don't want #ifdefs. What if we snapshot Aglais and Papilio in the library directory of the bundle?

I will have a go with the make target and record a "Hello world!".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keyboardio/Kaleidoscope/issues/727?email_source=notifications&email_token=AAALC2BR6HQ5AFF2QFQ76TTQXNOEPA5CNFSM4JQCGF42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGGBOYQ#issuecomment-562829154, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAALC2B4EZWGB5IFHDJSHRDQXNOEPANCNFSM4JQCGF4Q .