Closed gauravgahlot closed 4 years ago
@mrmrcoleman @grahamc Please update the description if I've missed anything from our discussion.
Hey @gauravgahlot it would also be good to consider which events should trigger the CI. The end to end covers all (?) of the services, so does it run whenever any of the repos change?
@gauravgahlot Which example are you going to use for the alpine work? (install Alpine Linux on worker)
@gauravgahlot did you already select a target platform for CI/CD? I doubt GitHub action, Circle-CI, Travis allowed to use vagrant.
Maybe @grahamc knows more because he tweeted about this
There is also another thing to point out and it is around workflow. It is not a safe approach to run e2e tests for every PR without a first triage and it can turn out to be a waste of time. So kubernetes has some automation, when a PR gets open they look at it and they add a label (they comment but its the same at the end) and the only PR with that label can be tested. Does it sound reasonable? And if we want to simplify the flow we can use Packet and not Vagrant, for now, I think it will be easier to set up because we can run it from GitHub Actions, Travis, Drone, or whatever.
Ideally, I would like to run those tests against all the platforms we support: Vagrant AND Packet AND whatever will come.
our internal https://drone.packet.net exists solely to run VMs (for osie tests) :D, we can use it.
But its an older version of drone. We can fire up a separate setup off of latest drone.io version and that way also keep external stuff off of our internal build hosts.
I can't recall if semaphore or circle ci let us bring our own build agents and thus allow us to run VMs in them.
@mmlb do you think it is useful to setup e2e tests in a way that they can support multiple platform? Right now we have
Tomorrow we will may have something else and the workflow I am picturing is the one we have for kubernetes.
/retest vagrant
/retest packet
Ideally, this does not even need a runner that supports VMs because we will reach packet-api and use a device (or more based on the tests itself) for every build.
Does it have a sense? Or is it too much? 😄
My 2 cents.
For Vagrant we'd want to test it in the way the user would use it. This means we'll need a build runner that supports virtualization.
I think testing the actual workflows might be tricky, but happy to think through it with you.
Today I got some time and I tried the Getting Started with Vagrant tutorial inside a Packet machine. I got it running apart for the worker that requires a GUI and we do not have it:
There was an error while executing `VBoxManage`, a CLI used by Vagrant
for controlling VirtualBox. The command and stderr is shown below.
Command: ["startvm", "19c0ccd6-9a38-4ee0-9330-29e413387eee", "--type", "gui"]
Stderr: VBoxManage: error: The virtual machine 'vagrant_worker_1594815205261_61888' has terminated unexpectedly during startup because of signal 6
VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component MachineWrap, interface IMachine
packet device create --operating-system ubuntu_20_04 --plan c1.small.x86 --project-id f49cf84 --hostname tink-e2e --facility ams1
#!/bin/bash
apt-get install -y linux-generic linux-image-generic linux-headers-generic virtualbox
# By default it looks for libvirt for some reason
export VAGRANT_DEFAULT_PROVIDER=virtualbox
At this point, you can follow the Getting Started until the end.
I tried to change the Vagrantfile in order to setup the worker without a GUI. I left this command running for a bit (~3m) and I killed the command:
$ vagrant up worker
worker: Warning: Authentication failure. Retrying...
worker: Warning: Authentication failure. Retrying...
worker: Warning: Authentication failure. Retrying...
worker: Warning: Authentication failure. Retrying...
worker: Warning: Authentication failure. Retrying...
worker: Warning: Authentication failure. Retrying...
worker: Warning: Authentication failure. Retrying...
worker: Warning: Authentication failure. Retrying...
worker: Warning: Authentication failure. Retrying...
I accessed the provisioner again to check the workflow status:
$ vagrant ssh provisioner
$ vagrant@provisioner:~docker exec -i deploy_tink-cli_1 tink workflow events c61e5327-5ed3-4a48-a314-533ebec222d9
+--------------------------------------+-------------+-------------+----------------+-------------------+--------------------+
| WORKER ID | TASK NAME | ACTION NAME | EXECUTION TIME | MESSAGE | ACTION STATUS |
+--------------------------------------+-------------+-------------+----------------+-------------------+--------------------+
| 0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94 | hello world | hello_world | 0 | Started execution | ACTION_IN_PROGRESS |
| 0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94 | hello world | hello_world | 0 | Action Failed | ACTION_FAILED |
+--------------------------------------+-------------+-------------+----------------+-------------------+--------------------+
it failed but I am sure it is just a timing issue
This was just a way to keep me busy for a bit. I think we should write e2e tests that can run anywhere and set up a table grid like this one https://testgrid.k8s.io/ where we can potentially write the same e2e tests across different providers like:
I tried executing an Ubuntu provisioning workflow on my local Vagrant setup without GUI. It was successful.
vagrant@provisioner:/vagrant/deploy$ docker-compose exec tink-cli tink workflow events 096f328e-6286-4aff-ada6-93e8eec38f96
+--------------------------------------+-----------------+-----------------+----------------+---------------------------------+--------------------+
| WORKER ID | TASK NAME | ACTION NAME | EXECUTION TIME | MESSAGE | ACTION STATUS |
+--------------------------------------+-----------------+-----------------+----------------+---------------------------------+--------------------+
| 0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94 | os-installation | disk-wipe | 0 | Started execution | ACTION_IN_PROGRESS |
| 0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94 | os-installation | disk-wipe | 16 | Finished Execution Successfully | ACTION_SUCCESS |
| 0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94 | os-installation | disk-partition | 0 | Started execution | ACTION_IN_PROGRESS |
| 0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94 | os-installation | disk-partition | 4 | Finished Execution Successfully | ACTION_SUCCESS |
| 0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94 | os-installation | install-root-fs | 0 | Started execution | ACTION_IN_PROGRESS |
| 0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94 | os-installation | install-root-fs | 22 | Finished Execution Successfully | ACTION_SUCCESS |
| 0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94 | os-installation | install-grub | 0 | Started execution | ACTION_IN_PROGRESS |
| 0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94 | os-installation | install-grub | 13 | Finished Execution Successfully | ACTION_SUCCESS |
| 0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94 | os-installation | cloud-init | 0 | Started execution | ACTION_IN_PROGRESS |
| 0eba0bf8-3772-4b4a-ab9f-6ebe93b90a94 | os-installation | cloud-init | 0 | Finished Execution Successfully | ACTION_SUCCESS |
+--------------------------------------+-----------------+-----------------+----------------+---------------------------------+--------------------+
Then to access the worker I had to do:
vagrant halt worker
// set GUI to true in Vagrantfile
vagrant up worker
The idea is to have smoke or E2E tests that:
Initial tests may cover the following workflows: