kdave / btrfs-progs

Development of userspace BTRFS tools
GNU General Public License v2.0
557 stars 242 forks source link

Find another CI #171

Closed kdave closed 1 year ago

kdave commented 5 years ago

Travis CI does not provide an updated system image and tests fail for that reason. We can't use that for development testing. Look for a better CI hosting.

Requirements:

kdave commented 5 years ago

CircleCI does not seem to provide kernel newer than 4.15 either.

unquietwiki commented 5 years ago

@kdave Possible options.... https://en.wikipedia.org/wiki/Comparison_of_continuous_integration_software

CyberShadow commented 5 years ago

What do you think about running an arbitrary kernel in qemu? It would be slower but would allow being independent of the CI host's kernel version.

kdave commented 5 years ago

The CI environment is already virtualized, I'm not sure that nested virtualization would work, but if yes then it'd solve the problem.

kdave commented 5 years ago

I've check SempahoreCI and AppVeyor, same 4.15 kernel. It's sometimes hard to dig the details, but if it mentions Ubuntu LTSS 18, then it means 4.15.

kdave commented 5 years ago

Same for gitlab CI.

Lakshmipathi commented 5 years ago

@kdave I will explore Gitlab CI, have some experience with it.

Lakshmipathi commented 5 years ago

I agree with using nest virtualization. Previously I tired something like btrfsqa but it uses aws instance.

Something like Gitlab docker/qemu will be a good alternative. We can utilize gitlab dind service and launch qemu. Qemu with custom kernel built from btrfs-devel running btrfs-progs tests will be nice. In future: when failure occurs, we can setup automated git bisect to identify which commit ( on btrfs-devel or btrfs-progs) causes failure. Let me know if your thoughts. If I understand gitlab terms correctly, gitlab-ci is free for OSS projects. (ex: no time-limits on ci runs)

Lakshmipathi commented 5 years ago

Gitlab doesn't allow forked project to have CI runs. So imported btrfs-progs from github and added gitlab-ci scripts https://gitlab.com/giis/btrfs-progs/commits/gitlabci

gitlab-ci flow:

  1. Create qemu image using debootstrap with appropriate pkgs.
  2. Ensure qemu image has appropriate systemd service to kick-start btrfs-progs build and tests.
  3. Create custom kernel using btrfs-devel
  4. Launch qemu with above prepared kernel and image.

(wip) Sample ci run: https://gitlab.com/giis/btrfs-progs/pipelines/65974411

Also working on reporting back errors (or force gitlab ci to fail when required).

Lakshmipathi commented 5 years ago

Successful job running make test-cli with above setup https://gitlab.com/giis/btrfs-progs/-/jobs/230278586

kdave commented 5 years ago

This looks very promising, thanks! The overall time seems to be log for the simple test-cli run (25 minutes). I think we don't need to build the kernel each time, though that this is possible is great and maybe I'll use that for kernel testing. If there's an external repository with some recent kernel then downlading it would be probably faster.

Lakshmipathi commented 5 years ago

okay. I'll create gitlab repo to host recent kernel bzImage file. We can fetch it from there and use with btrfs-progs testing. Is that fine?

adam900710 commented 5 years ago

BTW, why not use distro kernels which is mostly vanilla? E.g. kernels from Arch or OpenSUSE tumbleweed should be good enough for most btrfs-progs tests, and it saves tons of time of compiling kernel.

Lakshmipathi commented 5 years ago

Hi @adam900710, thats a good idea. Let me check whether I can extract kernel from docker image of OpenSUSE tumbleweed and then use that with QEMU.

kdave commented 5 years ago

The Tumbleweed kernel uses initrd that will be required to run qemu, and the initrd is usually created at kernel installation. For the CI run it can be pre-generated or possibly generated on the fly, but that's still some complication. We can start from what you have and improve it later.

CyberShadow commented 5 years ago

Hi! Is anyone working on this? If not, I would like to help.

Also, would it be possible to run xfstests on Travis or such? Building a minimal kernel for btrfs/xfstests (esp. with ccache) shouldn't take that long, but I don't know how long xfstests takes to run. The idea would be that contributors could test their kernel patches simply by pushing them to a branch on GitHub / opening a mock pull request on https://github.com/kdave/btrfs-devel.

Lakshmipathi commented 5 years ago

Hi @CyberShadow .

I was busy with work for last few week and just this Friday I started looking into this again. (https://gitlab.com/giis/btrfs-progs/pipelines) . Plan is to skip every day kernel build as suggested by @kdave and @adam900710

Some stats from this pipeline: https://gitlab.com/giis/btrfs-progs/pipelines/76463083 Time taken for each stage:

Image build - 7 minutes 3 seconds 
Kernel build - 31 minutes 54 seconds  
Cli tests -  25 minutes 28 seconds  (delay due to btrfs-progs build)
convert tests - 180 minutes 0 seconds  (rebuilding btrfs-prog)
fsck tests -  29 minutes 34 seconds    (rebuilding btrfs-progs)
fuzz tests - 36 minutes 38 seconds     (rebuilding btrfs-progs)
misc tests - 24 minutes 46 seconds     (rebuilding btrfs-progs)
mkfs tests - 28 minutes 6 seconds     (rebuilding btrfs-progs)
results - 1 minute 

Total duration: 7 + 31 + 25 + 180 + 29 + 36 + 24 + 28 + 1 = 361 (6hr)

Suggested flow:
Image build - Skip 
Kernel build - Skip 
btrfs-prog build - 20 minutes
Cli tests  - 5 mins 
convert tests - 150 mins+ (this going to be little longer)
fsck tests - 9 mins
fuzz tests - 16 mins
misc tests - 4 mins 
mkfs tests - 8 mins 

Total duration: 20 + 5 + 150 + 9 + 16 + 4 + 8 + 1  = 213 (4 hrs)

Avoiding image build ,kernel build and btrfs-progs build on each stage speeds up the ci by 2:30 hrs If we skip convert tests or keep it minimal run (say 15 mins) the entire ci time will come to 1:18 hrs instead of current 6hrs.

Basically, we can make use of the kernel and images from previous built gitlab artifacts. I already did that simple using curl (https://gitlab.com/giis/btrfs-progs/blob/fsci_stages_faster2/.gitlab-ci.yml#L33) That reduced the time by ~30 mins. Now I'm trying to avoid btrfs-progs build for each stage. It should be ready either today or tomorrow.

TBH, I was also thinking settings up xfstests run (from older scripts: https://github.com/Lakshmipathi/btrfsqa/blob/master/setup/scripts/003_xfstests) after completing this. You can make use of existing .gitlab-ci.yml and directory gitlab-ci from https://gitlab.com/giis/btrfs-progs/tree/fsci_stages_faster2/ and start working on xfstests. Let me know if you need further details. thanks!

CyberShadow commented 5 years ago

Hi!

Those timings are very strange. On my machine, these things don't take nearly as long.

Building btrfs-progs (./autogen.sh && ./configure && make -j8) takes 16 seconds.

Running the CLI tests (sudo make TEST_DEV=/tmp/test.img test-cli) takes 18 seconds.

Why is there such a dramatic difference when running them on CI? Are the tools built in some debug mode that makes them several magnitudes slower?

Some notes:

kdave commented 5 years ago

convert tests - 180 minutes 0 seconds (rebuilding btrfs-prog)

I've disabled convert tests from the devel testing and enabled only for pre-release tests, so you can scratch that too and this will decrease the runtime to something sane.

kdave commented 5 years ago

Building btrfs-progs (./autogen.sh && ./configure && make -j8) takes 16 seconds. Running the CLI tests (sudo make TEST_DEV=/tmp/test.img test-cli) takes 18 seconds.

Why is there such a dramatic difference when running them on CI? Are the tools built in some debug mode that makes them several magnitudes slower?

Isn't it due to the qemu emulation? I don't know the exact setup Lakshmipathi used but that would be my first guess, besides that the CI instances are slow or don't have enough CPUs.

I noticed the test suite creates the test image in the local test directory by default. As such, syncs will propagate to the host device. Putting the test image in tmpfs should be much faster.

This would speed things up, though it costs memory and I can't find now how much eg. the travis instance gets. The minimum is 2G for the scratch device.

Travis allows running parts of the test suite in parallel, by specifying the test part as an environment variable. If things cannot be sped up directly then this is an option that will allow fitting in Travis' one hour limit, but also, improve iteration time (less waiting until everything passes and much less waiting until something fails).

This sounds easy

I understand that this is the "canonical" mirror. What is the plan with regards to using GitLab CI on GitHub?

I push to github and gitlab at the same time, so the connection to CI is independent and each host picks the CI configuration. I can test anything on gitlab/github "as if it were the final version" without disturbing other development eg. by putting it to another branch than devel.

CyberShadow commented 5 years ago

Isn't it due to the qemu emulation?

Why would you build btrfs-progs inside the VM? Also, with KVM the overhead should be negligible. Compiling anything in qemu without KVM should probably be avoided.

This would speed things up, though it costs memory and I can't find now how much eg. the travis instance gets.

According to https://docs.travis-ci.com/user/reference/overview/ it's 7.5G. Should be enough. Might not be enough for xfstests?

I push to github and gitlab at the same time, so the connection to CI is independent and each host picks the CI configuration.

OK, I'm thinking about how to make it easier for contributors. Probably more people have a GitHub account than a GitLab account, but I guess it doesn't matter that much as long as it's discoverable.

kdave commented 5 years ago

According to https://docs.travis-ci.com/user/reference/overview/ it's 7.5G. Should be enough. Might not be enough for xfstests?

7.5 is more than enough, so the tmpfs for scratch image is ok, we just need to make it configurable so it does not explode on random users' machines.

OK, I'm thinking about how to make it easier for contributors. Probably more people have a GitHub account than a GitLab account, but I guess it doesn't matter that much as long as it's discoverable.

The point of gitlab regarding CI was better options than travis CI, so far the tests were post-merge. Extending that to the pull-request time checks makes sense, assuming most of them will come from github.

kdave commented 5 years ago

According to https://docs.travis-ci.com/user/reference/overview/ it's 7.5G. Should be enough. Might not be enough for xfstests?

Regarding fstests, I'm using VM instances with various memory sizes and 2G works too, the storage requirements are quite bigger to run the full suite though. One example for all, 6x independent block devices of at least 10G in size.

Lakshmipathi commented 5 years ago

Why is there such a dramatic difference when running them on CI?

That's because its run inside docker with qemu. Nested virutalization slowing things.

I've disabled convert tests from the devel testing and enabled only for pre-release tests, so you can scratch that too and this will decrease the runtime to something sane.

okay will disable convert tests, how much time does current CI take without 'convert tests'?

Parallel build is good option. So far I'm running two stages in parallel but remaining tests in sequence, let me check it parallel and check the difference.

Lakshmipathi commented 5 years ago

This pipeline with parallel tests took less than 30 mins. https://gitlab.com/giis/btrfs-progs/pipelines/77289785

CyberShadow commented 5 years ago

Great!

I have also been experimenting: https://github.com/CyberShadow/btrfs-ci

It looks like neither Travis nor GitLab CI support KVM in the test environment. This means that running under qemu will be terribly slow, as it will have to emulate the CPU in software. A possible better option is to use UML. Userspace in UML is quite slower than KVM, but still much faster than qemu without KVM. Kernel space code in UML (i.e. fs/btrfs/) shouldn't be much slower.

Also it looks like it is barely possible to run all tests without Docker or root, but the approaches I found are hacky at best. I had hoped that without requiring either, it could be ran on Travis' non-root infrastructure, which has more availability. But using Docker is more practical here.

Lakshmipathi commented 5 years ago

Thats nice. yes , UML another alternative which is in-between bare-metal and complete VM approach. I'm under the assumption that using qemu inside Docker may be helpful in future testing with different arch.

I still need to clean-up above pipeline a bit (need to investigate whether it reports failure correctly )

CyberShadow commented 5 years ago

I'm under the assumption that using qemu inside Docker may be helpful in future testing with different arch.

I think it's good to have that option, but I'm not sure it's necessary for daily CI runs.

Here's some more thoughts about testing:

What do you think? And, what are your further plans? Would be good to avoid wastefully working on the same thing if possible.

kdave commented 5 years ago

Also it looks like it is barely possible to run all tests without Docker or root, but the approaches I found are hacky at best. I had hoped that without requiring either, it could be ran on Travis' non-root infrastructure, which has more availability. But using Docker is more practical here.

Docker is used only for the build tests with musl libc and this is namely to catch accidental build breakages. Making this step optional or pre-release only is ok.

The root requirement is hard though. There's no easy way to mount/umount a fs, create/delete loop devices or access block devices.

kdave commented 5 years ago

The goal I'm aiming for is to make it feasible to run the xfstests as part of CI, so that btrfs contributors can test their code simply by opening a PR on GitHub or GitLab. But testing btrfs-progs using the same approach is not difficult as an additional task, so we can reuse the same code to test either.

Well, getting to successfuly configure and run fstests is not trivial, I documented that on wiki page and still find odd cases. Running a subset of fstests is doable but then there's the question how useful is that. Most problems I catch are when the full suite is run.

Building an UML kernel doesn't take long, so we can do it when testing btrfs-progs as well as when testing kernel patches. I think there are still more opportunities to speed up the UML kernel build (e.g. we may get away with not building ext4 or networking support).

I've experimented with UML in the past but it's then resorted to VMs. UML does not have SMP support and I vaguely remember there were some other problems. But for the progs testing it could work.

CyberShadow commented 5 years ago

The root requirement is hard though. There's no easy way to mount/umount a fs, create/delete loop devices or access block devices.

That doesn't matter if the actual tests are ran in a VM. I was trying to build the VM without root.

Well, getting to successfuly configure and run fstests is not trivial

That's why I think it would be very valuable to have something fully automated!

I've experimented with UML in the past but it's then resorted to VMs. UML does not have SMP support and I vaguely remember there were some other problems. But for the progs testing it could work.

OK, I'll give it a go and see how far I get.

If UML doesn't work out, maybe we can ask someone from the community or company using btrfs to donate a machine that we can run the tests on, as that will allow using KVM. We can use Buildkite for the CI API stuff and scheduling.

CyberShadow commented 5 years ago

UML does not have SMP support

It does from what I can see, and in theory it should be possible to simulate SMP using host threads.

and I vaguely remember there were some other problems.

Ran into this specimen:

http://lists.infradead.org/pipermail/linux-um/2019-August/001896.html

If nothing comes out of this, will have to give up on it too. Even if it's a kernel bug worth fixing, it is probably beyond me.

Lakshmipathi commented 5 years ago

Hi @kdave

Some info about the setup:

gitlab-ci file has option enable/disable build kernel and qemu image through BUILD_KERNEL and BUILD_IMAGE variables. https://gitlab.com/giis/btrfs-progs/blob/gitlab-ci/.gitlab-ci.yml#L37 https://gitlab.com/giis/btrfs-progs/blob/gitlab-ci/.gitlab-ci.yml#L57

After image is built, disable kernel/image builds (BUILD_KERNEL=0,BUILD_IMAGE=0) and re-use artifacts using PREBUILT_KERNEL_ID and PREBUILT_IMAGE_ID variables.

https://gitlab.com/giis/btrfs-progs/blob/gitlab-ci/.gitlab-ci.yml#L38 https://gitlab.com/giis/btrfs-progs/blob/gitlab-ci/.gitlab-ci.yml#L57

Let me know your thoughts or suggestions. If you like to perform dry run, simply copy .gitlab-ci.yml and gitlab-ci directory from gitlab-ci branch https://gitlab.com/giis/btrfs-progs/tree/gitlab-ci If this looks good, I'll go ahead and sent it as patch on mailing list.

Lakshmipathi commented 5 years ago

I don't think we can use GitLab CI on GitHub, which means that if we wanted to do this, we would need to find an alternative to the GitLab CI pipelines in your approach.

Hi @CyberShadow , If I'm not wrong, we should be able to trigger Gitlab-CI job for Github hosted project using something like https://docs.gitlab.com/ee/ci/ci_cd_for_external_repos/github_integration.html

Running a subset of fstests is doable but then there's the question how useful is that. Most problems I catch are when the full suite is run

@kdave, approximately, how much time does full suite run usually take?

Lakshmipathi commented 5 years ago

Removed hard-coded gitlab-project id from .gitlab-ci.yml and pushed it to gitlab-ci branch. Now GitLab CI files ( .gitlab-ci.yml and gitlab-ci/) completely generic and should work from any GitLab repo

Lakshmipathi commented 5 years ago

Submitted patch for review https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg91391.html

Lakshmipathi commented 5 years ago

@kdave, Is GitLab CI/CD option enabled as public on https://gitlab.com/kdave/btrfs-progs repo ? From Left menu, Settings->General->Visibility->Pipelines (below merge request) - this one i guess. I'm unable to view ci/cd pipelines from above repo.

kdave commented 5 years ago

It's enabled now, thanks for letting me know.

kdave commented 5 years ago

I've started one pipeline manually and it failed at the begining, probably to download some files or configuration problme. @Lakshmipathi can you please have a look?

Lakshmipathi commented 5 years ago

Hi @kdave, Docker build will check whether specific docker images exists or not. If it exists, it will pull and use it. Otherwise it will build and push into registry.

As you are running for the first time, it built the image but failed to push into GitLab registry registry.gitlab.com/kdave/btrfs-progs:gitlab-ci. Can you try enabling Settings->Visibility->Container registry (just below pipelines) ? I think that should solve the issue.

Successfully built 023f51e9b485
Successfully tagged registry.gitlab.com/kdave/btrfs-progs:gitlab-ci
The push refers to repository [registry.gitlab.com/kdave/btrfs-progs]
bcf1ff67ca70: Preparing
67ecfc9591c8: Preparing
denied: requested access to the resource is denied

I think these config steps needs to documented.

Lakshmipathi commented 5 years ago

Wondering whether some other access needed (like access token)

kdave commented 5 years ago

Container registry enabled in the settings

kdave commented 5 years ago

Docker build phase succeded, but kernel and image did not. Is there some ordering needed? From the logs it looks like they both depend on docker (ie. pulling gitlab-ci from registry).

Lakshmipathi commented 5 years ago

Yes, I think its issue with ordering. Both kernel and image build needs docker image. Will send a fix along with any other issue we will encounter.

Lakshmipathi commented 5 years ago

I can see image under https://gitlab.com/kdave/btrfs-progs/container_registry . Can you try re-starting it?

Lakshmipathi commented 5 years ago

I tried to reproduce the same issue with my repo (by deleting the docker-image). First time docker build succeed but kernel/image failed. Second time, all of them running now. https://gitlab.com/giis/btrfs-progs/pipelines/90580172 I think docker-build needs to be placed under different stage docker-build instead of current build stage.

kdave commented 5 years ago

I've pushed devel branch, that triggered another job and now it seems to work. So the ordering is probably required but the future jobs will work as long as the docker image is cached.

Lakshmipathi commented 5 years ago

okay. Yes since docker image already available on your registry will pull it from there. btrfs-progs build job failed on devel branch?

kdave commented 4 years ago

ERROR: Job failed: execution took longer than 1h0m0s seconds, not enough time to run the actual tests.

Lakshmipathi commented 4 years ago

oops, time for another config Settings->CI/CD->General pipelines->Timeout as 3hrs. Shared pipeline can have max limit of 3hrs. If we used our own gitlab-instance, then time-out can be much higher.