Closed kdave closed 1 year ago
CircleCI does not seem to provide kernel newer than 4.15 either.
@kdave Possible options.... https://en.wikipedia.org/wiki/Comparison_of_continuous_integration_software
What do you think about running an arbitrary kernel in qemu? It would be slower but would allow being independent of the CI host's kernel version.
The CI environment is already virtualized, I'm not sure that nested virtualization would work, but if yes then it'd solve the problem.
I've check SempahoreCI and AppVeyor, same 4.15 kernel. It's sometimes hard to dig the details, but if it mentions Ubuntu LTSS 18, then it means 4.15.
Same for gitlab CI.
@kdave I will explore Gitlab CI, have some experience with it.
I agree with using nest virtualization. Previously I tired something like btrfsqa but it uses aws instance.
Something like Gitlab docker/qemu will be a good alternative. We can utilize gitlab dind
service and launch qemu. Qemu with custom kernel built from btrfs-devel running btrfs-progs tests will be nice. In future: when failure occurs, we can setup automated git bisect to identify which commit ( on btrfs-devel or btrfs-progs) causes failure. Let me know if your thoughts. If I understand gitlab terms correctly, gitlab-ci is free for OSS projects. (ex: no time-limits on ci runs)
Gitlab doesn't allow forked project to have CI runs. So imported btrfs-progs from github and added gitlab-ci scripts https://gitlab.com/giis/btrfs-progs/commits/gitlabci
gitlab-ci flow:
(wip) Sample ci run: https://gitlab.com/giis/btrfs-progs/pipelines/65974411
Also working on reporting back errors (or force gitlab ci to fail when required).
Successful job running make test-cli
with above setup https://gitlab.com/giis/btrfs-progs/-/jobs/230278586
This looks very promising, thanks! The overall time seems to be log for the simple test-cli run (25 minutes). I think we don't need to build the kernel each time, though that this is possible is great and maybe I'll use that for kernel testing. If there's an external repository with some recent kernel then downlading it would be probably faster.
okay. I'll create gitlab repo to host recent kernel bzImage file. We can fetch it from there and use with btrfs-progs testing. Is that fine?
BTW, why not use distro kernels which is mostly vanilla? E.g. kernels from Arch or OpenSUSE tumbleweed should be good enough for most btrfs-progs tests, and it saves tons of time of compiling kernel.
Hi @adam900710, thats a good idea. Let me check whether I can extract kernel from docker image of OpenSUSE tumbleweed and then use that with QEMU.
The Tumbleweed kernel uses initrd that will be required to run qemu, and the initrd is usually created at kernel installation. For the CI run it can be pre-generated or possibly generated on the fly, but that's still some complication. We can start from what you have and improve it later.
Hi! Is anyone working on this? If not, I would like to help.
Also, would it be possible to run xfstests on Travis or such? Building a minimal kernel for btrfs/xfstests (esp. with ccache) shouldn't take that long, but I don't know how long xfstests takes to run. The idea would be that contributors could test their kernel patches simply by pushing them to a branch on GitHub / opening a mock pull request on https://github.com/kdave/btrfs-devel.
Hi @CyberShadow .
I was busy with work for last few week and just this Friday I started looking into this again. (https://gitlab.com/giis/btrfs-progs/pipelines) . Plan is to skip every day kernel build as suggested by @kdave and @adam900710
Some stats from this pipeline: https://gitlab.com/giis/btrfs-progs/pipelines/76463083 Time taken for each stage:
Image build - 7 minutes 3 seconds
Kernel build - 31 minutes 54 seconds
Cli tests - 25 minutes 28 seconds (delay due to btrfs-progs build)
convert tests - 180 minutes 0 seconds (rebuilding btrfs-prog)
fsck tests - 29 minutes 34 seconds (rebuilding btrfs-progs)
fuzz tests - 36 minutes 38 seconds (rebuilding btrfs-progs)
misc tests - 24 minutes 46 seconds (rebuilding btrfs-progs)
mkfs tests - 28 minutes 6 seconds (rebuilding btrfs-progs)
results - 1 minute
Total duration: 7 + 31 + 25 + 180 + 29 + 36 + 24 + 28 + 1 = 361 (6hr)
Suggested flow:
Image build - Skip
Kernel build - Skip
btrfs-prog build - 20 minutes
Cli tests - 5 mins
convert tests - 150 mins+ (this going to be little longer)
fsck tests - 9 mins
fuzz tests - 16 mins
misc tests - 4 mins
mkfs tests - 8 mins
Total duration: 20 + 5 + 150 + 9 + 16 + 4 + 8 + 1 = 213 (4 hrs)
Avoiding image build ,kernel build and btrfs-progs build on each stage speeds up the ci by 2:30 hrs If we skip convert tests or keep it minimal run (say 15 mins) the entire ci time will come to 1:18 hrs instead of current 6hrs.
Basically, we can make use of the kernel and images from previous built gitlab artifacts. I already did that simple using curl (https://gitlab.com/giis/btrfs-progs/blob/fsci_stages_faster2/.gitlab-ci.yml#L33) That reduced the time by ~30 mins. Now I'm trying to avoid btrfs-progs build for each stage. It should be ready either today or tomorrow.
TBH, I was also thinking settings up xfstests run (from older scripts: https://github.com/Lakshmipathi/btrfsqa/blob/master/setup/scripts/003_xfstests) after completing this. You can make use of existing .gitlab-ci.yml
and directory gitlab-ci
from https://gitlab.com/giis/btrfs-progs/tree/fsci_stages_faster2/ and start working on xfstests. Let me know if you need further details. thanks!
Hi!
Those timings are very strange. On my machine, these things don't take nearly as long.
Building btrfs-progs (./autogen.sh && ./configure && make -j8
) takes 16 seconds.
Running the CLI tests (sudo make TEST_DEV=/tmp/test.img test-cli
) takes 18 seconds.
Why is there such a dramatic difference when running them on CI? Are the tools built in some debug mode that makes them several magnitudes slower?
Some notes:
I noticed the test suite creates the test image in the local test directory by default. As such, syncs will propagate to the host device. Putting the test image in tmpfs should be much faster.
Travis allows running parts of the test suite in parallel, by specifying the test part as an environment variable. If things cannot be sped up directly then this is an option that will allow fitting in Travis' one hour limit, but also, improve iteration time (less waiting until everything passes and much less waiting until something fails).
I understand that this is the "canonical" mirror. What is the plan with regards to using GitLab CI on GitHub?
convert tests - 180 minutes 0 seconds (rebuilding btrfs-prog)
I've disabled convert tests from the devel
testing and enabled only for pre-release tests, so you can scratch that too and this will decrease the runtime to something sane.
Building btrfs-progs (
./autogen.sh && ./configure && make -j8
) takes 16 seconds. Running the CLI tests (sudo make TEST_DEV=/tmp/test.img test-cli
) takes 18 seconds.Why is there such a dramatic difference when running them on CI? Are the tools built in some debug mode that makes them several magnitudes slower?
Isn't it due to the qemu emulation? I don't know the exact setup Lakshmipathi used but that would be my first guess, besides that the CI instances are slow or don't have enough CPUs.
I noticed the test suite creates the test image in the local test directory by default. As such, syncs will propagate to the host device. Putting the test image in tmpfs should be much faster.
This would speed things up, though it costs memory and I can't find now how much eg. the travis instance gets. The minimum is 2G for the scratch device.
Travis allows running parts of the test suite in parallel, by specifying the test part as an environment variable. If things cannot be sped up directly then this is an option that will allow fitting in Travis' one hour limit, but also, improve iteration time (less waiting until everything passes and much less waiting until something fails).
This sounds easy
I understand that this is the "canonical" mirror. What is the plan with regards to using GitLab CI on GitHub?
I push to github and gitlab at the same time, so the connection to CI is independent and each host picks the CI configuration. I can test anything on gitlab/github "as if it were the final version" without disturbing other development eg. by putting it to another branch than devel.
Isn't it due to the qemu emulation?
Why would you build btrfs-progs inside the VM? Also, with KVM the overhead should be negligible. Compiling anything in qemu without KVM should probably be avoided.
This would speed things up, though it costs memory and I can't find now how much eg. the travis instance gets.
According to https://docs.travis-ci.com/user/reference/overview/ it's 7.5G. Should be enough. Might not be enough for xfstests?
I push to github and gitlab at the same time, so the connection to CI is independent and each host picks the CI configuration.
OK, I'm thinking about how to make it easier for contributors. Probably more people have a GitHub account than a GitLab account, but I guess it doesn't matter that much as long as it's discoverable.
According to https://docs.travis-ci.com/user/reference/overview/ it's 7.5G. Should be enough. Might not be enough for xfstests?
7.5 is more than enough, so the tmpfs for scratch image is ok, we just need to make it configurable so it does not explode on random users' machines.
OK, I'm thinking about how to make it easier for contributors. Probably more people have a GitHub account than a GitLab account, but I guess it doesn't matter that much as long as it's discoverable.
The point of gitlab regarding CI was better options than travis CI, so far the tests were post-merge. Extending that to the pull-request time checks makes sense, assuming most of them will come from github.
According to https://docs.travis-ci.com/user/reference/overview/ it's 7.5G. Should be enough. Might not be enough for xfstests?
Regarding fstests, I'm using VM instances with various memory sizes and 2G works too, the storage requirements are quite bigger to run the full suite though. One example for all, 6x independent block devices of at least 10G in size.
Why is there such a dramatic difference when running them on CI?
That's because its run inside docker with qemu. Nested virutalization slowing things.
I've disabled convert tests from the devel testing and enabled only for pre-release tests, so you can scratch that too and this will decrease the runtime to something sane.
okay will disable convert tests, how much time does current CI take without 'convert tests'?
Parallel build is good option. So far I'm running two stages in parallel but remaining tests in sequence, let me check it parallel and check the difference.
This pipeline with parallel tests took less than 30 mins. https://gitlab.com/giis/btrfs-progs/pipelines/77289785
Great!
I have also been experimenting: https://github.com/CyberShadow/btrfs-ci
It looks like neither Travis nor GitLab CI support KVM in the test environment. This means that running under qemu will be terribly slow, as it will have to emulate the CPU in software. A possible better option is to use UML. Userspace in UML is quite slower than KVM, but still much faster than qemu without KVM. Kernel space code in UML (i.e. fs/btrfs/
) shouldn't be much slower.
Also it looks like it is barely possible to run all tests without Docker or root, but the approaches I found are hacky at best. I had hoped that without requiring either, it could be ran on Travis' non-root infrastructure, which has more availability. But using Docker is more practical here.
Thats nice. yes , UML another alternative which is in-between bare-metal and complete VM approach. I'm under the assumption that using qemu inside Docker may be helpful in future testing with different arch.
I still need to clean-up above pipeline a bit (need to investigate whether it reports failure correctly )
I'm under the assumption that using qemu inside Docker may be helpful in future testing with different arch.
I think it's good to have that option, but I'm not sure it's necessary for daily CI runs.
Here's some more thoughts about testing:
The goal I'm aiming for is to make it feasible to run the xfstests as part of CI, so that btrfs contributors can test their code simply by opening a PR on GitHub or GitLab. But testing btrfs-progs using the same approach is not difficult as an additional task, so we can reuse the same code to test either.
Building an UML kernel doesn't take long, so we can do it when testing btrfs-progs as well as when testing kernel patches. I think there are still more opportunities to speed up the UML kernel build (e.g. we may get away with not building ext4 or networking support).
If the goal is to maintain mirrors and accept contributions via either GitHub and GitLab, then we should make the testing infrastructure work on either. I don't think we can use GitLab CI on GitHub, which means that if we wanted to do this, we would need to find an alternative to the GitLab CI pipelines in your approach. Perhaps more direct optimization methods such as not using qemu software CPU emulation, caching, or building less code can make up for it.
There is the question of where to put the CI metadata when testing the kernel. We obviously can't put .travis.yml
etc. in the kernel source tree. Contributors could be asked to temporarily rebase their patches on a ci
branch we maintain, but that's cumbersome. Some CI services, such as AppVeyor, allow storing the CI configuration out-of-tree, but that severely limits what CI services can be used. I was thinking that a small proxy CI service could be used instead, which would push references to commits to a dummy CI repository then copy the CI status to the kernel repository. I have some experience in this area (DAutoTest GHDaemon) so I wouldn't mind implementing and hosting it.
What do you think? And, what are your further plans? Would be good to avoid wastefully working on the same thing if possible.
Also it looks like it is barely possible to run all tests without Docker or root, but the approaches I found are hacky at best. I had hoped that without requiring either, it could be ran on Travis' non-root infrastructure, which has more availability. But using Docker is more practical here.
Docker is used only for the build tests with musl libc and this is namely to catch accidental build breakages. Making this step optional or pre-release only is ok.
The root
requirement is hard though. There's no easy way to mount/umount a fs, create/delete loop devices or access block devices.
The goal I'm aiming for is to make it feasible to run the xfstests as part of CI, so that btrfs contributors can test their code simply by opening a PR on GitHub or GitLab. But testing btrfs-progs using the same approach is not difficult as an additional task, so we can reuse the same code to test either.
Well, getting to successfuly configure and run fstests is not trivial, I documented that on wiki page and still find odd cases. Running a subset of fstests is doable but then there's the question how useful is that. Most problems I catch are when the full suite is run.
Building an UML kernel doesn't take long, so we can do it when testing btrfs-progs as well as when testing kernel patches. I think there are still more opportunities to speed up the UML kernel build (e.g. we may get away with not building ext4 or networking support).
I've experimented with UML in the past but it's then resorted to VMs. UML does not have SMP support and I vaguely remember there were some other problems. But for the progs testing it could work.
The
root
requirement is hard though. There's no easy way to mount/umount a fs, create/delete loop devices or access block devices.
That doesn't matter if the actual tests are ran in a VM. I was trying to build the VM without root.
Well, getting to successfuly configure and run fstests is not trivial
That's why I think it would be very valuable to have something fully automated!
I've experimented with UML in the past but it's then resorted to VMs. UML does not have SMP support and I vaguely remember there were some other problems. But for the progs testing it could work.
OK, I'll give it a go and see how far I get.
If UML doesn't work out, maybe we can ask someone from the community or company using btrfs to donate a machine that we can run the tests on, as that will allow using KVM. We can use Buildkite for the CI API stuff and scheduling.
UML does not have SMP support
It does from what I can see, and in theory it should be possible to simulate SMP using host threads.
and I vaguely remember there were some other problems.
Ran into this specimen:
http://lists.infradead.org/pipermail/linux-um/2019-August/001896.html
If nothing comes out of this, will have to give up on it too. Even if it's a kernel bug worth fixing, it is probably beyond me.
Hi @kdave
Some info about the setup:
gitlab-ci file has option enable/disable build kernel and qemu image through BUILD_KERNEL and BUILD_IMAGE variables. https://gitlab.com/giis/btrfs-progs/blob/gitlab-ci/.gitlab-ci.yml#L37 https://gitlab.com/giis/btrfs-progs/blob/gitlab-ci/.gitlab-ci.yml#L57
After image is built, disable kernel/image builds (BUILD_KERNEL=0,BUILD_IMAGE=0) and re-use artifacts using PREBUILT_KERNEL_ID and PREBUILT_IMAGE_ID variables.
https://gitlab.com/giis/btrfs-progs/blob/gitlab-ci/.gitlab-ci.yml#L38 https://gitlab.com/giis/btrfs-progs/blob/gitlab-ci/.gitlab-ci.yml#L57
Let me know your thoughts or suggestions. If you like to perform dry run, simply
copy .gitlab-ci.yml and gitlab-ci directory from gitlab-ci branch https://gitlab.com/giis/btrfs-progs/tree/gitlab-ci
If this looks good, I'll go ahead and sent it as patch on mailing list.
I don't think we can use GitLab CI on GitHub, which means that if we wanted to do this, we would need to find an alternative to the GitLab CI pipelines in your approach.
Hi @CyberShadow , If I'm not wrong, we should be able to trigger Gitlab-CI job for Github hosted project using something like https://docs.gitlab.com/ee/ci/ci_cd_for_external_repos/github_integration.html
Running a subset of fstests is doable but then there's the question how useful is that. Most problems I catch are when the full suite is run
@kdave, approximately, how much time does full suite run usually take?
Removed hard-coded gitlab-project id from .gitlab-ci.yml
and pushed it to gitlab-ci
branch. Now GitLab CI files ( .gitlab-ci.yml and gitlab-ci/) completely generic and should work from any GitLab repo
Submitted patch for review https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg91391.html
@kdave, Is GitLab CI/CD option enabled as public on https://gitlab.com/kdave/btrfs-progs repo ? From Left menu, Settings->General->Visibility->Pipelines (below merge request)
- this one i guess. I'm unable to view ci/cd pipelines from above repo.
It's enabled now, thanks for letting me know.
I've started one pipeline manually and it failed at the begining, probably to download some files or configuration problme. @Lakshmipathi can you please have a look?
Hi @kdave, Docker build will check whether specific docker images exists or not. If it exists, it will pull and use it. Otherwise it will build and push into registry.
As you are running for the first time, it built the image but failed to push into GitLab registry registry.gitlab.com/kdave/btrfs-progs:gitlab-ci
. Can you try enabling Settings->Visibility->Container registry
(just below pipelines) ? I think that should solve the issue.
Successfully built 023f51e9b485
Successfully tagged registry.gitlab.com/kdave/btrfs-progs:gitlab-ci
The push refers to repository [registry.gitlab.com/kdave/btrfs-progs]
bcf1ff67ca70: Preparing
67ecfc9591c8: Preparing
denied: requested access to the resource is denied
I think these config steps needs to documented.
Wondering whether some other access needed (like access token)
Container registry enabled in the settings
Docker build phase succeded, but kernel and image did not. Is there some ordering needed? From the logs it looks like they both depend on docker (ie. pulling gitlab-ci from registry).
Yes, I think its issue with ordering. Both kernel and image build needs docker image. Will send a fix along with any other issue we will encounter.
I can see image under https://gitlab.com/kdave/btrfs-progs/container_registry . Can you try re-starting it?
I tried to reproduce the same issue with my repo (by deleting the docker-image). First time docker build succeed but kernel/image failed. Second time, all of them running now. https://gitlab.com/giis/btrfs-progs/pipelines/90580172 I think docker-build needs to be placed under different stage docker-build
instead of current build
stage.
I've pushed devel branch, that triggered another job and now it seems to work. So the ordering is probably required but the future jobs will work as long as the docker image is cached.
okay. Yes since docker image already available on your registry will pull it from there. btrfs-progs build job failed on devel branch?
ERROR: Job failed: execution took longer than 1h0m0s seconds
, not enough time to run the actual tests.
oops, time for another config Settings->CI/CD->General pipelines->Timeout
as 3hrs. Shared pipeline can have max limit of 3hrs. If we used our own gitlab-instance, then time-out can be much higher.
Travis CI does not provide an updated system image and tests fail for that reason. We can't use that for development testing. Look for a better CI hosting.
Requirements: