Design: Test TinyPilot Debian package install

The TinyPilot install process is sufficiently complex that it would be helpful to have tests validating that the installer does what we expect.

We've had a few bugs in the past due to subtle oversights in install logic, and we only discovered them when they bit us later.

Rough idea

Create a Docker container that emulates Raspberry Pi OS Lite (put in paths like /boot/config.txt that exist in Pi OS)
- This will probably just be the standard Debian:11 image with a few changes on top to put in paths that make it look like Pi OS as far as TinyPilot is concerned.
Boot into the Docker container in CircleCI
Install the TinyPilot Debian package (or run the bundle install script)
Run checks to verify that the system is in the expected state (e.g., our desired modifications to /boot/config.txt are there, paths we want to create exist, users exist that we expect).
Uninstall the TinyPilot Debian package
Reinstall the TinyPilot Debian package
Verify that modifications to the files we expect happen exactly once rather than once per install

Deliverable

This ticket is just to flesh out the design rather than to get into implementation.

The design should cover:

What tests will we perform to validate the install?
How can we create a Docker container that mimics Raspberry Pi OS for our purposes?

I accidentally duplicated this idea in https://github.com/tiny-pilot/tinypilot/issues/1691 so there are additional ideas there.

What tests will we perform to validate the installation?

The installation process broadly consists of four types of action - installing dependencies, copying files into place, modifying existing files, and running external commands. When taken from a known starting point, these actions have defined outcomes and should be easy to test. I've addressed each of these actions below.

Were dependencies installed correctly?

It's reasonable to assume that the package management tools will work correctly. However, if we want to test this action ourselves, we could have a script query the apt utility for the state of each package we expect to be present following installation. My opinion is that this isn't necessary.

Next step: @mtlynch will decide whether to test this action.

Were files copied correctly?

Many approaches could work for validating that the installer copied files into place correctly, ranging from simple presence checks all the way through to comparing checksums. As our other functional tests provide us with reasonable confidence in the contents of the files, we should only need to run a series of basic presence checks.

We could implement these presence checks by comparing the expected file and folder structures against the output of a series of ls commands.

Next step: @cghague will determine which files and folders to compare.

Were existing files modified correctly?

The installer modifies various files, which, from a quick review of the code, all appear to be text-based. We can test that the installer has changed the files correctly by running simple before and after comparisons. A robust but blunt approach would be to use a script to:

Make a .bak of each file we want to check.
Run the un/installation process.
Diff each file against its corresponding .bak file.
Check the output matches a pre-defined result.

A more focused approach could use pattern matching to validate the changes, but that would require us to design precise patterns to avoid accidentally fuzzy matching, changes in the wrong part of a file, and other common pitfalls that occur with this approach.

Next step: @mtlynch will choose which comparison method to use. Next step: @cghague will determine which files the installer modifies.

Did external commands have the desired outcome?

The installation process runs numerous external commands, most of which are mainly to support the installation. However, some of these commands make changes to the system. We should identify these commands and implement suitable tests. One example would be confirming the successful creation of the "tinypilot" user.

Next step: @cghague will identify the commands we should test.

How can we create a Docker container that mimics Raspberry Pi OS?

In an ideal scenario, we'd run these tests on an actual installation of Raspberry Pi OS, but we can't realistically do that within the confines of CircleCI and Docker.

The closest we can get will likely be starting with a Debian Docker container and then using chroot to enter an extracted copy of the Raspberry Pi OS image. This approach isn't perfect, but it should be good enough when used alongside our existing automated and manual tests.

The process would look something like this:

CircleCI would download, create, start, and enter a Debian container.
A script running within the container would then:
1. Download a copy of Raspberry Pi OS.
2. Extract and mount it to /mnt/rpios/ and /mnt/rpios/boot.
3. Jump into the extracted image by running chroot /mnt/rpios.
4. Run the tests.

Next step: One of our team will build a basic proof of concept.

@mtlynch - I've put together an outline plan for this with our suggested next steps. Please let me know your thoughts on the ones tagged for you; in the meantime, I'll start on the others.

However, if we want to test this action ourselves, we could have a script query the apt utility for the state of each package we expect to be present following installation. My opinion is that this isn't necessary.

Yeah, I agree. I don't think this is worth the effort.

We could implement these presence checks by comparing the expected file and folder structures against the output of a series of ls commands.

I want to avoid checking the presence of every file because that's going to be a brittle test that's likely to break accidentally during normal development (e.g., we rename a script but forget to rename the check).

I think it's sufficient to check that one file is present in its expected location (e.g. COPYRIGHT), and that's enough to verify that Debian package utility is placing files correctly.

There are other files we place during postinst, but for those, I'd want to check the contents and not just the files.

We can test that the installer has changed the files correctly by running simple before and after comparisons. A robust but blunt approach would be to use a script to:

Sure, this sounds like a good approach.

My concern is that it's going to become brittle and hard to maintain. For example, if we upgrade to a newer release of Raspberry Pi OS, maybe the line numbers change, and now the diff needs to be rewritten. But maybe we can pass flags to diff to hide the line numbers.

But I think we can start with this approach and revisit if we find the tests breaking out from under us when we haven't changed anything.

We should identify these commands and implement suitable tests

This sounds good.

The closest we can get will likely be starting with a Debian Docker container and then using chroot to enter an extracted copy of the Raspberry Pi OS image. This approach isn't perfect, but it should be good enough when used alongside our existing automated and manual tests.

Let's try this.

My concern is that it's going to be too slow to download the Raspbian image every time. We might get 90% as much confidence in 1/100th the time if we just place a couple of dummy files to make Debian look like Raspbian. Or we could try creating our own Docker image ahead of time that already has the Raspbian .img file downloaded, so we can point CircleCI to that Docker image and just skip to the chroot step at CI runtime.

But let's start with the approach of downloading everything at CI runtime and see what performance looks like.

Next steps

@cghague - Can you create a list of tasks (in this issue, hold off on creating tickets) to lay out the plan of what steps we need to do in what order?

We want to sort in descending order of "bang for buck," so our first task should be to do the least amount of work possible to get to an easy test of the install, and then once we have that, we keep adding more tests based on how much it costs us to implement vs. how likely they are to catch an error.

Tasks to create a full proof-of-concept

The following tasks will allow us to develop a full proof-of-concept while continually offering something of value should we abort.

Step 1: Create a proof-of-concept test shell script

[ ] Implement a single proof-of-concept test in a shell script.
[ ] Verify that the test works using an actual Raspberry Pi.

Rationale: Implementing the test in isolation immediately produces a resource we could use as part of our manual release testing process. A single test should be enough for a proof-of-concept.

Step 2: Implement scripts to bootstrap from Debian into a Raspberry Pi OS image

[ ] Set up a local Debian test environment (e.g., a virtual machine).
[ ] Implement a script to download and chroot into the Raspberry Pi OS image.

Rationale: It is likely easier to develop and debug the bootstrap scripts without the complexity of CircleCI and Docker. This approach also makes it more likely we could have the ability to run these test suites locally in the future.

Step 3: Verify that the test works in the bootstrap environment

[ ] Enter the bootstrap environment using the scripts from the previous tasks.
[ ] Verify that the proof-of-concept test works in the bootstrap environment.

Rationale: This step allows us to spot any obvious issues with the chroot approach before introducing the complexity of CircleCI and Docker.

Step 4: Migrate from a local environment to CircleCI

[ ] Create a test branch on the tinypilot repository with the new scripts added.
[ ] Modify .circleci/config.yml to start a suitable Docker container.
[ ] Modify .circleci/config.yml to run the relevant scripts.
[ ] Verify that the proof-of-concept test continues to work.

Rationale: CircleCI allows for the automated testing we want. The work up until this point should prove the validity of the chroot approach, and this step should prove that it can be part of our CI/CD process.

Step 5: Test the proof-of-concept

We should have a working proof-of-concept at this stage, which we should test as follows:

[ ] Ensure that a "broken" build causes the test to fail.
[ ] Measure how long the test takes to bootstrap and run.
[ ] (Optional) Implement a second proof-of-concept test.

Rationale: This is a good checkpoint for ensuring the proof-of-concept addresses the testing gap we set out to resolve and that any tests that seem like they might be "tricky" can be implemented with this approach.

Outline of future tasks

Once we have tested the proof-of-concept, we can decide whether to continue developing it or abort the project. Assuming we do continue with it, the next steps would be:

[ ] Decide what tests we'd want based on the preliminary work earlier in this issue.
[ ] Create individual Support Engineering tasks to implement each test.

@mtlynch - Does this seem like a good plan of action to you?

@cghague - Cool, this is a good draft!

I think this is a good plan, but I'd like to adjust the presentation a bit. Can we orient the tasks around deliverables? When we finalize a plan, we'll want to convert the tasks to a set of Github tickets, and we'll want the Github tickets to be things we can resolve with a PR.

So "create a test" and "verify it works" would be a single step. And we'd check that task off by creating the script and merging it into out dev-scripts folder as a PR.

Implement a single proof-of-concept test in a shell script.

Can we enumerate specifically what checks we'll perform in this script?

I'm thinking something like:

Step 1: Create a proof-of-concept test shell script

Create a simple shell script that sanity checks a TinyPilot install. The script should verify that:

/opt/tinypilot-privileged/scripts/change-hostname exists and is non-empty

other check

...

The script should exit with status 0 if TinyPilot is installed as expected and exit with non-zero and an error message if any of the sanity checks fail.

At this point, we'll run the script manually, so we can verify it works on a real TinyPilot device, but we won't have any automation set up to run this in CI or in a virtual environment.

Can we also do this as a PR of a markdown file to make it easier to review? See https://github.com/tiny-pilot/tinypilot-pro/pull/1090 for an example.

Create a test branch on the tinypilot repository with the new scripts added.

Just want to clarify that we don't need to do anything special with branches for this work. We'll do PRs from branches like normal, but changes can go directly into our main branch once approved. There should be little risk to affecting prod with these scripts.

Set up a local Debian test environment (e.g., a virtual machine).

I think we should plan for Docker unless we have a strong reason not to. We need it to work under CI, so it needs to work under Docker at that point.

tiny-pilot / tinypilot