proposal: adopt unit test framework?

pflarue / ardop

Source for various version of ARDOP

Other

29 stars 5 forks source link

proposal: adopt unit test framework? #30

Closed cbs228 closed 2 months ago

cbs228 commented 4 months ago

Unit tests are a massive win for code quality. Unit-test frameworks help standardize and automate tests, and we should consider adopting one. The following projects have compatible open licenses and are pure C with excellent compiler support, including Linux and Windows:

cmocka: supports mock functions and objects; used by libssh and samba
Unity: simple, three files
µnit: parameterized tests, one source and one header
Branching out into C++, there is also µt (single-header) and Catch2.
Many others are available; find one you like!

Of these, I definitely prefer cmocka since it permits mock objects. Mocks permit replacing "production" functions with bare-bones implementation that do a predictable thing. For example, you could overwrite the state machine with one that always returns "NAK" to test ARQ logic. Mocks are unusual to have in C because the only way to really do them is with linker-fu.

The downside of cmocka is that it really demands a more modern build system like meson or CMake. Either would be a win for cross-platform compatibility. No need to maintain separate Makefiles and Visual Studio projects. Either can be made to fetch dependencies from the internet, or you can vendor a copy of the test framework with ardopcf.

The other frameworks don't support mocks, but they are easier to fold into the existing Makefile.

Unit test frameworks are not meant for performance tests and don't do things like invoke external programs. For my own project, sameold, I just use shell scripts for that sort of thing.

pflarue commented 4 months ago

I agree that something like this is a good idea, but it will take a lot of time and effort to do well, and I have other higher priorities. So, in the short term, I plan to implement some limited validation testing based on production and analysis of audio recordings, mostly using features of ardopcf that have already been implemented. I'll probably using python scripts to automate the process of generating recordings, getting ardopcf to try to decode them, and then parse the log files to analyze the results. If I can produce something like this that I believe is useful, I'll add these scripts to a testing directory in the repository.

pflarue commented 4 months ago

Employing features currently in the develop branch, but that will be included in the upcoming next release of ardopcf, I've been starting to work on some external Python based tools to implement some of the validation testing that I described in my last comment on this issue. I had initially been thinking about including these tools in a test-tools branch of this repoditory not intended to ever merge into master since these tools are independent from building ardopcf. However, now I'm wondering whether a separate repository might be a better way to share and collaborate on the development of these tools.

While I welcome anyone's thought on this approach, I'd particularly appreciate input from @cbs228 and @Dinsmoor who have already been contributing to this project.

Dinsmoor commented 4 months ago

TL:DR - Unit testing isn't worth anyone's time right now - any testing on the current code base should be done via behavior analysis and performance testing from host applications.

If a codebase hasn't been created from the get-go with unit testing in mind, it makes it very difficult to do so, especially if you're not the one who wrote it all to begin with.

For performance testing:

I think @pflarue has some sort of WAV in-out performance testing solution that will probably work just fine in the meantime, at least to detect regressions. I don't know exactly how comprehensive it is, what it covers, and how it detects issues.

I've been tinkering with an idea using GNUradio + alsa loopbacks to simulate a channel in 'real time' and simulate every data mode from ideal conditions to not-ideal , and record these. Compare average power levels, decode rate, etc. With development of @pflarue 's RX0 mode, it may be possible to get the data needed for meaningful trend analysis.

For behavior testing:

This is trying to use all the functionalities of the modem, and reporting any errors. For example, I found a segfault with the CODEC command, which doesn't seem to be used for anything, so there's a branch that will probably be merged to get rid of it. Stuff like that. Write applications for the modem and see if anything breaks.

I've been working on "hamChat" here: https://github.com/Dinsmoor/hamChat - and it's pretty easy to add functionality via plugins. Writing a plugin that sends whatever commands to the modem in a preprogrammed manner and test these against expected outputs would be very easy to add. I just added a command window where you can send arbitrary UTF8 to the modem's command socket.

Happy Tuesday!

cbs228 commented 4 months ago

@pflarue I'm wondering whether a separate repository might be a better way

In general, I find that monorepos are easier to work with. The questions to ask are:

Do the dev tools require a particular version of the production code? If you frequently revise the production code to expose API or to fix issues for the test code, the answer is "yes."
Do you want to make simultaneous changes to the dev tools and the production code—potentially at once in the same MR? This is important if you are adding functionality and want to add dev tests for it concurrently. Splitting them into separate repos makes this particular activity tedious and tough. You'll have a lot of commits like "bumped version of X to Z" that maybe have to get pushed concurrently. Then you forget to push one and wonder why things are broken.
Do you want to use CI to automate testing? If so, having it all in a monorepo makes it more convenient.

On the other hand:

If your dev tools can't be released with production, they may need to be in separate repos. For example, if your dev tools expose their own API to other projects (unlikely) or are usually broken (not great), they may require different release cycles.
If you absolutely must mix-and-match different versions of the dev tools with different versions of the production code, they probably belong in separate repos.
If you need different access control or notification settings for dev vs. production, they may need to be in separate repos.

To adopt monorepo, just put your dev code in its own subfolder, and just use the develop branch to version it. The repository already has a subproject-friendly layout. If your users don't want to build or use your test tools, they are not obligated to do so. Just make sure that your build system can omit them.

@Dinsmoor If a codebase hasn't been created from the get-go with unit testing in mind, it makes it very difficult to do so

This is correct, but selecting a test framework and standing it up is still a worthwhile activity.

There are probably some low-level functions that could get tests right away. Anything string- or buffer-related would be first on my list.
A little more of the codebase could be modularized for testing with "some" effort. Things like the host interface and the ARQ logic might be separable enough.
The DSP would likely be on the harder side.
When we invariably need to refactor or add new modules, the new code can get tests from the get-go.

Some coverage is better than none.

When we are ready to do so, we should select and integrate a test framework and get started with those tests. I can write a few by way of example.

We should definitely do this before we embark on any major refactors.

testing on the current code base should be done via behavior analysis

The problem with "test by analysis" is that it is usually manpower-intensive. It doesn't take much to change a behavior test. When it does an expert needs to review it and ponder the reason why.

Unit tests should run very quickly (seconds) and be pass/fail. This means they can run with every change, every commit.

Both kinds of tests are valuable and necessary. In my experience, unit tests are better at uncovering serious crash bugs. As a user, I can tolerate a few unnecessary ARQ repeats. I am much less tolerant of getting 9.8 minutes into a 10 minute send, only for the program to crash in the last kilobyte. WL2K node operators probably don't like crashes either.

pflarue commented 4 months ago

Thanks for the feedback.

I agree that unit testing has the potential to be very useful. However, I don't think that right now is the best time to attempt to start implementing it. As I described in a recent post to ardop.groups.io, I intend to publish a new release of ardopcf associated with a merge of the current contents of the develop branch into master within the next few days. Then, changes for the next release after that will be focused on removing the remainder of the obsolete code associated with Packet, as well as probably eliminating support for the SCS host interface and the Teensy micro-controller. That message to ardop.groups.io requested a response from anyone who might still be using these features that I'm proposing to eliminate.

There are also source files in the repository that are not referenced by the Makefile and functions in some of the source files that are not being used. These will also be further evaluated, and removed unless a reason to retain them is discovered.

My hope is that after removing the obsolete and unused code from the repository, that what remains will be easier to understand and maintain. I believe that any effort to begin implementing unit testing before then would be counterproductive.

I have done some work recently to automatically exercise various features of the ardopcf executables using python scripts. Other python scripts parse data logged to file or console to evaluate ardopcf's behavior. They also modify and analyze WAV files produced by ardopcf and used as inputs back to ardopcf. This has helped me to identify some things that needed fixing, and some other areas that deserve additional study. Since these interact with ardopcf only by running the executable and parsing its output, and they use python scripts which do not need to be compiled, they are completely independent of the build process. Based on comments made above, I think I'll eventually add them to their own directory in this repository. However, my initial attempt at implementing such testing has also helped me to identify some shortcomings in my own knowledge that I need to address before I am ready to share those scripts and their results with others.

pflarue commented 4 months ago

Regarding the choice of a unit testing framework, as described in the initial statement of this issue: I have looked only briefly at some of the options presented, and don't yet have a clear understanding of how adopting one of these frameworks would affect the ardopcf code base.

It is important to me that someone with limited programming experience who is curious about ardopcf be able to easily build an executable from the source code and study/understand portions of that source code either simply to learn how certain features are implemented or to adapt portions of the code for use in another project. To that end, the idea of using a "more modern build system" sounds like something that makes the project structure and build process less obvious. Perhaps this is an incorrect conclusion based on my own ignorance of these more modern systems. I also hope that we can keep any testing related code as separate as possible from the main code base so as not to make it seem more complex than it is.

See also Issue #45 regarding some planning that I think is required to better define what high level behavior ardopcf is intended to have, and what tests can be used to determine whether it is behaving as it should.

cbs228 commented 3 months ago

and don't yet have a clear understanding of how adopting one of these frameworks would affect the ardopcf code base.

The choice of unit test frameworks doesn't really affect your production code.

The part that will impact the production code is (re)architecting it to be testable. Good tests test a portion of the code in isolation. Testing the DSP routines should not require that the ALSA interface work. You shouldn't have to feed complete modem waveforms to test the ARQ logic. This means that:

Code needs to be modular
Scope should be limited
Functions should avoid weird side-effects
I/O must usually be fully separated from processing

This may grow your "internal" APIs a little bit. But you almost always get cleaner code out of the process, and it's much easier to maintain.

Occasionally, it is necessary to write production code with hooks for unit tests. For example, rather than invoking the mythical GetNextAudioFrameFromALSA() function in a loop, you might need to accept a function pointer. In the real ardopcf, you bind it to GetNextAudioFrameFromALSA(). In your tests, you inject a function that maybe just repeats a hardcoded block of samples. This is called dependency injection. You can sometimes achieve the same thing with mocks.

There are a lot of tutorials on test-driven development and how to write effective unit tests.

None of this will happen all at once. It will happen bit by bit, 'till it's done.

I also hope that we can keep any testing related code as separate as possible from the main code base

I would keep test-cases in their own directory.

ARDOPC/
│
├ src/
│ ├ ardop.c
│ ├ a_module.c
│ └ some_other_module.c
│
└ test/
  ├ test_a_module.c
  └ test_some_other_module.c

The test-cases are their own program. They do not become part of the final ardopcf executable. Instead, the production .o object files get linked into a different program whose main() function is provided by the test framework. The program's only purpose is to run the unit tests. It is not installed, and some configurations don't even build it.

The test-cases do need to live in the same repo and be updated along with the production code. They need to be fast and automatic. Some build configurations will run the tests with every build.

using a "more modern build system" sounds like something that makes the project structure and build process less obvious.

There are many C build systems, up to and including no build system at all. Unless it's documented in README.md, all the builder can do is guess.

The main impact is to require the builder to install an additional program, like meson or cmake. (Meson also additionally wants ninja.) This is very easy on Linux. It is not difficult on Windows. Builds consist of running two or three commands, but they are easy to document in your README.md.

The biggest advantage is that targeting multiple compilers is easy. Both meson and CMake permit you to target gcc, clang, MinGW, and MSVC without necessarily knowing all the CLI arguments for each. Meson in particular has options that are useful for distro packagers, which may make it easier (and more attractive) to one day package ardopcf for Debian or Ubuntu.

It may be worth giving the meson tutorial a try in a throwaway directory.

CMake works pretty well too, but IMO I end of writing a lot of boilerplate CMake code to make all but the most trivial of projects work.

pflarue commented 3 months ago

Thanks. That helps clarify some ideas.

The main impact is to require the builder to install an additional program, like meson or cmake. (Meson also additionally wants ninja.)

I'd prefer to avoid this. I don't mind if someone wanting to run the tests needs to install something extra. However, I'd prefer that someone just wanting to compile a binary from source, perhaps after making edits to fix a bug or try some variant behavior, doesn't need to install anything extra.

one day package ardopcf for Debian or Ubuntu.

I could be wrong, but I don't really expect this to happen. Ardop (all implementations) has a limited user base. By improving it's reliability and usability, I hope that we can convince more people to use it. However, it will probably never be very widely used. Also, by keeping it to a single executable file and providing binaries for a variety of platforms, it should be easy enough to install that getting it from a package repository doesn't seem too important.

That was one of the reasons why I decided to hard code the html, css, and js for the new Webgui into the compiled binary. Reading these from separate files would have been simpler to build and maintain, but would have made installation more complicated and fragile.

Eventually, I hope that we can use what we learn from improving ardopcf as a starting point for beginning development of a higher performance next generation protocol that might attract a larger user base. Maybe packaging will be more relevant then.

cbs228 commented 3 months ago

I could be wrong, but I don't really expect this [packaging] to happen.

pat, which is commonly used with ARDOP, is packaged for Ubuntu and Debian. It would probably be good to provide a modem as well. We're not there yet, but maybe someday.

I hope that we can use what we learn from improving ardopcf as a starting point for beginning development of a higher performance next generation protocol

I am very interested in David Rowe's codec2 and its associated modems. They use portable C99, and the code is very clean and has some tests. (But some tests require octave, which is not ideal.) There isn't an ARQ protocol yet. FreeDATA attempts to add various data modes, but it is still in its infancy, is heavily GUI-focused, and is all python.

I'm a huge fan of js8call, which is very robust, but the narrowband modems might not lend themselves to a faster, non-QRP ARQ session. The authors were very clever during the beta-test: they baked an expiration date into each release. This forced hams to upgrade to newer protocols, and the devs could get rid of bad versions faster.

From a WL2k standpoint, what we need the most is buy-in from node operators and a commitment to an aggressive release cycle. New modems are lovely, but if no one uses them they don't help. Given the huge market share of proprietary solutions (PACTOR, VARA), it might be wise to protect a next-generation protocol with a reference implementation that is GPLv3 or even AGPL. The last thing we want is to encourage a closed-source fork.

pflarue commented 3 months ago

Codec2 sounds interesting. I'm no expert, and I haven't looked at their code, but I'd expect the code for a streaming protocol to be structured very differently from that for an ARQ protocol. While some of the underlying modulation ideas will be common, the goals and challenges are simply very different.

I haven't tried or looked at js8call either, but I think the idea is a wonderful one. Where getting a brief message out with poor conditions and/or low power is more important than fast bulk transfer of data, using a narrow bandwidth highly robust mode with heavy error correction just makes sense. I also understand that js8call implements a workaround for FT8s clock time dependence by adapting to the signals heard. Really smart! As with comparison to streaming modes, the design goals and challenges for js8call are also very different from an ARQ protocol intended for guaranteed accurate bulk transfer of data as fast as can be achieved over varying conditions within a specified bandwidth. I do think that FT8/js8call's use of Costas arrays for signal detection, tuning, and timing is a powerful idea that might be worth adopting in a next generation ARQ protocol.

For a new ARQ mode to be adopted for use with Winlink, I'm not sure whether the bigger challenge is convincing the RMS operators or the Winlink/ARSFI board. For either of them, I think that provably competitive performance (based on HF path simulation in addition to over the air use) and a well documented, stable, and reliable interface are likely to be key selling points. I also believe that if we do a good job of improving ardopcf, that it's users may be willing to try out and provide feedback for a new tool developed by the same team.

As to your comments about a GPL licensed implementation, I'm not interested in working on something like that. I much prefer a more liberal license like MIT that allows anybody who is interested in doing so to build on it while allowing it to be combined with other ideas and just about any other pieces of code. That is why a major effort for the recent release of ardopcf was to get rid of all GPL licensed code in the repository.

Commercial protocols like Pactor or VARA have most of the market share because there isn't currently an open alternative that can compete with them. I think that allowing Ardop to become broken was a mistake that we are now correcting, because sometimes it is worth sacrificing some speed or robustness for openness and efficient, easy to install, cross-platform support. However, if an open source solution with similar performance to the commercial solutions is developed, I believe that it will acquire a significant market share.

If somebody then wants to commercialize a compatible or similar product based on the open source Implementation, I don't have a problem with that either. The key is that the open source version will exist that amateurs can choose to use. I can imagine similar open source and commercial systems coexisting, where perhaps the commercial version provides a more polished interface, or better customer support, or (for non-amateur use) provides built-in encryption, while the open source version is free and allows continued experimentation that might lead to something even better.

cbs228 commented 3 months ago

I wrote up some test-cases to demonstrate cmocka. In the process, I happened to find a bug! I would appreciate a review of PR #50 to see what we do/don't like about my proposed approach for unit testing. We can refine it as needed to better fit this project. I think the time-invested-to-bugs-found ratio has proven to be favorable.

pflarue commented 2 months ago

PR #50 introduced a unit test framework.