utmapp / UTM

Virtual machines for iOS and macOS
https://getutm.app
Apache License 2.0
27.05k stars 1.34k forks source link

Develop a comprehensive test suite #2565

Open osy opened 3 years ago

osy commented 3 years ago

As the complexity of the project grows, we find that often adding a change to QEMU configurations breaks something else unintended. This pretty much happens every single change and is becoming unmanageable. We need an actual test plan:

  1. A set of .utm files of various operating systems, architectures, and configurations.
  2. A checklist of things to test for each one (mouse, sound, network, etc)
  3. A matrix of configurations to check (Intel/M1 Mac, iPad, iOS of different versions, etc)

Then we can have a release sign-off that involves making sure everything is tested.

ntindle commented 3 years ago

Howdy! As someone in the world of DevOps, I have a few questions to help spur discussion.

  1. Are there any tests that already exist? If yes, manual or automated? What is their location?
  2. What are our core areas that are most critical that need tested first?
  3. Is this a process that we want automated or manually done?
  4. Do we want failures to block releases or just be aware that they are occurring?

Thanks, Nick

osy commented 3 years ago
  1. Haha no. At best I launch a couple of vms (from the gallery and a few windows) but mostly right now it’s make a new release and see if anyone complains.
  2. We should assume QEMU “works” and focus on UTM specific stuff like launching VMs with various configuration options. Most of the things that get broken are various incompatible config options when I update the command arg generation for one target and it breaks another target.
  3. I think manual is easier but automatic is preferred. Manual just means there’s a list of things /somebody/ tries before each release.
  4. I think it should block release.
ntindle commented 3 years ago

Great answers!

Couple more questions:

  1. Defining success: is dropping into an OS after launch considered a success? Are there any other criteria that we would need to test?
  2. Is there a way for UTM to tell it has successfully launched a VM programmatically?
  3. Do we have a place to store custom ISO's with startup scripts for automation?
osy commented 3 years ago

I think booting past bios is probably enough for “success” even though there’s bugs (like usb input) which shows up much later. Currently there’s no way (but if it’s not launched successfully then UTM will error). There are currently no such scripts.

Also full disclosure I plan to overhaul UTM backend pretty soon in order to support new macos 12 virtualization framework so there will be a lot of changes that attempt to decouple QEMU from UTM.

ntindle commented 3 years ago

I propose the document Release Test Scenarios which should be used to define the manual/automated test behaviors that we'd want to run.

Obviously the goal would be to automate all the tests that we'd want to run. I'm going to continue on assuming that's the goal.

I'm sure you can tell that your compatibility matrix is absolutely massive with 4 supported platforms, 11 gallery images, and tens of devices. This, combined with the number of tests that should be run will make anything but automation a nightmare.

We have a few ways forward from here:

  1. Mark some things as untested and move on
  2. Setup a test matrix that supports all of these devices

Assuming we are moving forward with option 1: My proposal is to start with acquiring two mac mini's through Mac Stadium's opensource program or similar for automated testing. We could also look at using the GitHub MacOS devices but I'm pretty sure thats in closed beta. (EDIT: I see its already being used. That could work then.)

From there,

  1. Build a single test image that will run our inside the vm test scripts.
  2. Setup a test suite that spawns runs UTM and executes the automated tests on it. I would assume E2E tests are the only ones being executed right now and unit tests can come later.
  3. The VM will receive a test from the test runner, execute it, and pass the result back.
  4. This operation will continue through all tests, creating, destroying, and reconfiguring VMs as necessary.

For iOS, its a bit more complicated. We'd likely need to run in the simulator or mirror the device. Maybe Corellium has jailbroken device sims we could look at.

IMO, The start of this shouldn't really pivot around using a specific backend so should be able to move forward either way.