Contributing simple Linux CI based on GitHub Actions via pull request welcome?

hartwork commented 1 year ago

Hi!

I noticed that this repository does not have any living CI — there seems to be dead Travis CI config but I see no success-fail-markers on recent commits anywhere — so that currently pull requests could in theory break compilation unnoticed any time even with careful review. Use of GitHub Actions would be free of cost for you, works well, and only requires addition of a single YAML file to activate. I'm thinking of an approach similar to https://github.com/projectM-visualizer/frontend-libvisual-plug-in/blob/master/.github/workflows/linux.yml , aiming to cover at least GCC 13 and Clang 17 here. What do you think?

Best, Sebastian

mpartel commented 1 year ago

Good point - I had actually forgotten about Travis going away.

Currently I run bindfs tests on multiple platfroms with Vagrant. This has upsides and downsides:

:heavy_plus_sign: Runs locally :heavy_plus_sign: Tests some older platforms too (people seem to use them) :heavy_plus_sign: Reasonably quick and parallelized :heavy_minus_sign: No MacOS :heavy_minus_sign: Probably not easy to run in a CI (though at least GitHub Actions seems to have nested virtualization now) :heavy_minus_sign: Still has a surprisingly gnarly issue where Vagrant corrupts the terminal's state somehow

I'm happy to take CI patches, but I'd like to retain the above upsides. Some possible approaches:

Option 1: Port the test boxes from Vagrant to GitHub Actions and find a way to run GitHub Actions locally (would act work?). Option 2: Get the Vagrant test runner to work under GitHub Actions. Option 3: Keep Vagrant for testing locally and have some GitHub Actions on top of that. The duplication feels bad, but it's probably not a huge issue in practice.

These are in descending order of preference, but I think even option 3 would add a bit of value, especially if it could easily give us MacOS support. (The tests may well be broken on MacOS currently, but that's a separate issue.)

hartwork commented 1 year ago

Hi @mpartel thanks for your interest and quick and detailed reply!

There are tons of options in this space and the more you invest the more you can cover, speaking of developer time mostly because in a public repository the CI runs will be free of cost.

Yes, there is some support for nested virtualization (you linked https://github.com/actions/runner-images/issues/183) but it lacked acceleration when I last tried it in April 2023 which is also noted in that very ticket further down. My scenario back then was testing an Ansible playbook against a QEMU VM in CI and it had more than 90 minutes runtime which were a no-go to other members of the team; I'm trying to verify/falsify now that it still takes this long today but I won't know that before it's finished… probably 90 minutes from now, same link. If it's still status quo it would mean that (a) you would need so-called large GitHub Actions runners that actually have acceleration (potentially not for free) or (b) you would need to use Vagrant with containers rather than VMs which would lose non-Linux support and hence not be the same value or (c) you accept 60+ minutes of runtime for all jobs involving Vagrant. The integration with GitHub Actions itself would probably not more than:

Install Vagrant
Run https://github.com/mpartel/bindfs/blob/master/vagrant/test.rb with a single VM name
Wrap (2) by an explicit matrix with all the VM names for full parallelism on GitHub Actions level
Potentially add another check for that matrix being 100% matching the set of Vagrant files in Git so that it's impossible to miss out on adding a new entry there when a new machine is added.

I could try making that work, but I'd want to be sure that runtime of that magnitude is even acceptable to this project before trying it out.

It would be worth noting that covering (a) error-free compilation, (b) warning-free compilation and (c) test suite execution are very different tasks with different gains and costs. Something that you can have in literally 15 minutes is covering error-free compilation on Linux, and gives significant value over no CI already.

Regarding act, my understanding is that it uses different images than the original GitHub Actions and if it's not using the same images it will break in different places than the original and then in my world lots of its promise becomes marketing and a dream but not reality, but that's arguably a rather pessimistic view.

With all that said, what do you think so far?

mpartel commented 1 year ago

Thanks for the analysis!

The more likely it is that I can reproduce any CI failures locally, the less I worry about a long runtime. If the CI just runs the vagrant tester then reproducability should be good (apart from any environmental concurrency/timing bugs, hopefully rare, but I've seen some).

The implementation steps sound reasonable.

In step 2, the GitHub Action should probably print the machine's log on completion (at least on failure). Alternatively test.rb could even have a flag for streaming/tailing it, unless this issue makes that problematic (I tried many things to resolve that weirdness, but didn't keep notes unfortunately).

Step 4: good idea. Could e.g. make it part of make check, which is fast enough that I run it regularly (could well be a pre-commit hook).

(a) error-free compilation (b) warning-free compilation and (c) test suite execution

I think having (a) for many platforms provides most of the value, followed closely by (c) for many platforms.

(b) would be nice too. Probably not too difficult to add CFLAGS=-Werror or something when running on the CI or Vagrant.

About act - yeah, I should have taken an extra minute to look at it. Docker therefore Linux-only is too much of a downgrade from Vagrant.

Happy to take a PR for any version of this :+1:, if you still feel like it.

hartwork commented 1 year ago

Hi @mpartel,

here's a few things I learned since my last post:

Acceleration for virtualization is still missing in GitHub Actions, so QEMU runs in turtle-speed software emulation mode meaning that:
- CI still took 100 minutes for https://github.com/gentoo-ev/public-infra/actions/runs/6827349263
- /dev/kvm still does not exist in the file system
Some of the Vagrant boxes in here use images that are only avalbale for provider "virtualbox" e.g. https://app.vagrantup.com/ubuntu/boxes/bionic64 so we'd need VirtualBox for virtualization anyway or the existing boxes need to be changed to QEMU/KVM
VirtualBox on Linux GitHub Actions fails command "vagrant up" saying things like "Stderr: VBoxManage: error: VT-x is not available (VERR_VMX_NO_VMX)". That's not much of a surprise.
vagrant/test.rb exits with code 0 on error which is not ideal for any kind of CI, so I'll create a pull request making it exit with code 1 on error.
My try with image macos-12 and pre-installed VirtualBox ended with error The hosted runner encountered an error while running your job. (Error Type: Disconnect). after 20+ minutes of no output. Which made me put this experiment on pause for today.

So for now I would like to put Vagrant inside Actions on the backburner and only do these two smaller pull requests regarding CI:

145
146

I should note that adding CFLAGS=-Werror to CI is a good, but it will likely uncover unfixed warnings and would therefore be nice to be separate in a second round when we alread have basic CI.

What do you think?

mpartel commented 1 year ago

Looks good - many thanks!

Acceleration for virtualization is still missing in GitHub Actions, so QEMU runs in turtle-speed software emulation mode meaning that:

:turtle: :checkered_flag:

CI still took 100 minutes for https://github.com/gentoo-ev/public-infra/actions/runs/6827349263 /dev/kvm still does not exist in the file system

What would be the expected time for that one with acceleration?

Some of the Vagrant boxes in here use images that are only avalbale for provider "virtualbox" e.g. https://app.vagrantup.com/ubuntu/boxes/bionic64 so we'd need VirtualBox for virtualization anyway or the existing boxes need to be changed to QEMU/KVM

I think this should be doable for all the boxes. Probably not terribly difficult, but yet another yak to shave of course.

VirtualBox on Linux GitHub Actions fails command "vagrant up" saying things like "Stderr: VBoxManage: error: VT-x is not available (VERR_VMX_NO_VMX)". That's not much of a surprise.

VirtualBox used to have an unaccelerated mode too, but I guess they removed it :/

Also VirtualBox has often been buggy for me. It has caused instability in the past, and even today it printed scary memory errors to dmesg, so switching to libvirt would definitely be a positive.

My try with image macos-12 and pre-installed VirtualBox ended with error The hosted runner encountered an error while running your job. (Error Type: Disconnect). after 20+ minutes of no output. Which made me put this experiment on pause for today.

We could try a Windows runner too :trollface:

I should note that adding CFLAGS=-Werror to CI is a good, but it will likely uncover unfixed warnings and would therefore be nice to be separate in a second round when we alread have basic CI.

:+1:

hartwork commented 1 year ago

CI still took 100 minutes for https://github.com/gentoo-ev/public-infra/actions/runs/6827349263 /dev/kvm still does not exist in the file system

What would be the expected time for that one with acceleration?

@mpartel a lot less, but I forgot the details. A fair and important question. I hope I can spin up the same setup locally with KVM on my notebook later today and get you a number (and a few words on the involved hardware).

Some of the Vagrant boxes in here use images that are only avalbale for provider "virtualbox" e.g. https://app.vagrantup.com/ubuntu/boxes/bionic64 so we'd need VirtualBox for virtualization anyway or the existing boxes need to be changed to QEMU/KVM

I think this should be doable for all the boxes. Probably not terribly difficult, but yet another yak to shave of course.

I still have my experiment code. Is any of the existing boxes availabe for KVM/QEMU? If so, I could give that single box a try again and we'd get a runtime duration number.

My try with image macos-12 and pre-installed VirtualBox ended with error The hosted runner encountered an error while running your job. (Error Type: Disconnect). after 20+ minutes of no output. Which made me put this experiment on pause for today.

We could try a Windows runner too :trollface:

True, but that's not a job I would volunteer for :smile:

Maybe for nested virtualization with GitHub integration there are better free options than GitHub Actions. From a bit of Googling around:

So maybe AppVeyor or Cirrus could work to get Vagrant in here, if wanted. Would be beyond what I can volunteer my own time for though, for the moment.

hartwork commented 1 year ago

CI still took 100 minutes for https://github.com/gentoo-ev/public-infra/actions/runs/6827349263 /dev/kvm still does not exist in the file system

What would be the expected time for that one with acceleration?

@mpartel a lot less, but I forgot the details. A fair and important question. I hope I can spin up the same setup locally with KVM on my notebook later today and get you a number (and a few words on the involved hardware).

@mpartel I'm afraid I'll be blocked from a proper answer because I when run locally, accessing the Docker registry fails for some reasons and then Docker images cannot be built inside the VM and so the script exits after ~8 minutes. I can only guess it would take 15 to 50 minutes rather than 100, at the moment, sorry.

mpartel commented 1 year ago

Thanks for the CI googling. Good to know.

A 2-4x slowdown due to unaccelerated virtualization doesn't sound like the end of the world.

I still have my experiment code. Is any of the existing boxes availabe for KVM/QEMU? If so, I could give that single box a try again and we'd get a runtime duration number.

All boxes by "roboxes" a.k.a. "generic" seem to have libvirt variants, so e.g. roboxes/debian11 and roboxes/ubuntu2204

mpartel commented 1 year ago

I'm working on getting Vagrant tests to run in the CI.

Turns out some GitHub Actions runners have KVM, and some don't: https://github.com/mpartel/bindfs/actions/runs/6842967533

mpartel commented 1 year ago

On my machine, testing a repeat run without initial VM setup:

KVM: 46s
No KVM: 267s i.e. a 5.8x difference

hartwork commented 1 year ago

I'm working on getting Vagrant tests to run in the CI.

@mpartel cool! Also thanks for mentioning it, avoiding duplication.

Turns out some GitHub Actions runners have KVM, and some don't: https://github.com/mpartel/bindfs/actions/runs/6842967533

You got to be kidding! :smiley: Interesting find!

Glad to see you embracing this and taking over :+1:

hartwork commented 1 year ago

A 2-4x slowdown due to unaccelerated virtualization doesn't sound like the end of the world.

@mpartel I'm having a hard time to agree: 100 minutes of CI is different game than 25 minutes, at least to me.

All boxes by "roboxes" a.k.a. "generic" seem to have libvirt variants, so e.g. roboxes/debian11 and roboxes/ubuntu2204

Good to know, thanks!

mpartel commented 1 year ago

Anything above a few minutes is too long for me not to context switch :man_shrugging:

First fully passing Vagrant run: https://github.com/mpartel/bindfs/actions/runs/6843186193 :tada:

Except that one happened to run freebsd12 on a KVM runner. The previous two runs didn't, and they apparently timed out waiting for some part of the VM to boot up.

Oh well, I'll leave this until next weekend. I'll probably disable freebsd12 in CI until we hopefully get KVM in all runners. If you feel especially nerdsniped, the branch is vagrant-ci

Anyway, thanks for kicking this CI thing off!

hartwork commented 1 year ago

@mpartel congrats on how far you've come with this! I don't have any experience with FreeBSD myself yet but I'm optimistic you'll figure it out, good luck!

hartwork commented 1 year ago

@mpartel PS: My vote for adding vagrant tests in a new file vagrant.yml rather than adding them to linux.yml, e.g. because freebsd12 is not Linux and at least so far these tests are not too connected.

mpartel commented 1 year ago

I decided to rename the whole thing to tests.yml.

Closing this as completed, with freebsd12 disabled for now. Thanks again!

hartwork commented 1 year ago

@mpartel cool, thanks for embracing CI! :+1:

mpartel / bindfs

Contributing simple Linux CI based on GitHub Actions via pull request welcome? #144

145

146