fully automated testsuite

maci0 commented 11 years ago

I would like to automate building of kmod- packages for fedora.

But instead of just building the packages and pushing them into an internal repo i would like to run some tests.

Ive seen there are some in the zfs-tests package but iirc nothing that would be like a full-scale test of most features in an automated fashion.

i would like to test it in an VM of course so i can add dozens of drives with different kinds of emulated controllers etc.

in my opinion there should be an automated testsuite to run a regression test on each commit, possibly via jenkins (which is also able to start/stop vms, etc etc etc)

behlendorf commented 11 years ago

I agree completely! Locally, I have some scripts which do automated testing on every commit (xfstest, filebench, ...) but that testing largely just stresses the posix layer and none of the zfs commands themselves. What I'd like to see happen if for the existing zfs test suite to updated so it can be run easily on Linux. That work was originally described in issue #6 but thus far no ones had the time to do it. However it does need to happen, would you be interested in taking a crack at it. The Delphix guys have an newer version of the test suite they run here, https://github.com/delphix/zfstest

maci0 commented 11 years ago

i think the first step would be to set up some CI infrastructure. if i happen to have some free time on my hands i will set up jenkins.

if it detects a new commit in spl/zfs it should:

git pull
build the spl/zfs rpms and put them somewhere
setup a vm and fedora within it (possibly via kickstart), automatically install the rpms
have some script inside the vm which executes the zfs test suite and maybe some other stuff on firstboot
somehow push the results back into jenkins

which brings up a question: is it possible to build the kmod rpms for a kernel other than the currently running one, given of course that headers etc are installed for the target kernel version?

edit: of course all the configuration steps should be documented and reproducible

behlendorf commented 11 years ago

@maci0 Locally I use buildbot to test all patches before they get merged in to the master branch. I have 20 or so long running VMs with various distributions installed, the buildbot builders apply a proposed patch, build packages, install the packages and run the test suite.

That said, I'm certainly not opposed to something more sophisticated. For example, I'd love to be able to quickly clone a clean VM for each round of testing to install the packages in.

Building the packages against kernels other than the running one is supported by the kmod packages. All that needs to be installed are the development headers for the kernel in question. You can request packages for that kernel with the 'kernels' rpmbuild define. See the upstream kmods2 packaging guidelines for directions.

Putting something together that all the developers could easily use to test their patches would be very helpful.

FransUrbo commented 10 years ago

What would be the best way to start this? Clone the illumos-gate code, copy the test dir to the zfs clone and then 'git add' them? Or is there a smarter way?

behlendorf commented 10 years ago

@FransUrbo Step one on porting the Illumos test suite (test runner) is probably to clone the tree and get a lay of the land so to speak. See how portable the existing code is. What would be required to get it running on Linux, etc. Then we can strike up a meaningful conversation with the FreeBSD and Illumos folks about a reasonable way forward.

FransUrbo commented 10 years ago

Status update:

I've been going back and forth about this, do or don't, do or don't... Part of it is python (which makes me sick and nauseated :), much is BSD Makefiles (which don't work very well with GNU make it seems - they are at least way to complicated to use straight off) and the rest is korn shell.

I spend four or five hours trying to get the make procedure to work, until I gave up and attacked the actual test scripts that, luckily, is korn shell. That gave a lot more and better results!

All of them might not be possible do port cleanly (they use a lot of BSD specific commands such as svcadm, ufsdump, ufsrestore and many, many others).

maci0 commented 10 years ago

bmake, kornshell and solaris specific tools. sound like software archaeology. In that case I think we are better off thinking of something more innovative.

FransUrbo commented 10 years ago

Oh, yes, I feel like a little like an paleontologist!! :)

But I'm actually making some progress... Once I realized that I could just ignore everything, and 'hit' the actual test directly, I just managed to get the test procedure to run.

They all fail or is skipped, but it's definitely progress :)

FransUrbo commented 10 years ago

More progress:

Results Summary
FAIL     214
SKIP     639
PASS      20

Weee, this might actually be doable! :)

behlendorf commented 10 years ago

@FransUrbo Sounds like good progress!

FransUrbo commented 10 years ago

Yeah, just let me bitch and whine a while and I'll get right on it :D

FransUrbo commented 10 years ago

Question: Is it important to be able to run a pool on the same host the test is run on?

This seems be disallowed in the Illumos code (some tests tries to destroy ALL pools it can find after completion).

behlendorf commented 10 years ago

It would be nice if you could safely run it in a system with an existing pool. Or at least somehow restrict it to certain block devices. We don't want to accidentally destroy someone's pool. For the automated testing this probably won't be an issue though.

FransUrbo commented 10 years ago

Well, I turns out I'll probably have to do as much of the initial port I can and then let someone else do the final work.

There's a lot of BSD stuff in there, that I can't remember what they did and I don't have the space to setup a OpenIndiana VM so I could test.

FransUrbo commented 10 years ago

Can anyone with OpenIndiana/Illumos run

zpool status -v <pool>

for me? There's a couple of snippets that seems to need that, but I'm not sure if the code works as supposed to on Linux - I suspect a difference in output...

b333z commented 10 years ago

This good @FransUrbo ?

# uname -a
SunOS b34 5.11 omnios-8c08411 i86pc i386 i86pc

# zpool status -v rpool
  pool: rpool
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          c4d0s0    ONLINE       0     0     0

errors: No known data errors

FransUrbo commented 10 years ago

Perfect, thanx! Turns out, that's identical. So that's not the part that fails... Needs to keep looking then :)

FransUrbo commented 10 years ago

Just wanted to let everyone know that I've finished "round one" of this at https://github.com/zfsonlinux/zfs-test/tree/issue-1534.

The two important things to know:

1. The main run script - this is where everything starts
    https://github.com/zfsonlinux/zfs-test/blob/issue-1534/test/zfs-tests/cmd/scripts/zfstest.ksh

2. The main test run file with definitions of what to do
    https://github.com/zfsonlinux/zfs-test/blob/issue-1534/test/zfs-tests/runfiles/linux.run

What's left to do:

A. Go through the test groups again (especially the ones disabled) and see if there's something I can do (see point two above).

B. Create a automake system.

C. Documentation (I've done some, but needs to be better).

FransUrbo commented 10 years ago

How would someone translate

https://github.com/illumos/illumos-gate/blob/master/usr/src/cmd/initpkg/swapadd.sh

Isn't this just 'swapon -a'?

behlendorf commented 10 years ago

@FransUrbo Nice work on this. As for swapadd it sure looks like that would be equivilant to swapon -a.

FransUrbo commented 10 years ago

I could need some help explaining https://github.com/zfsonlinux/zfs-test/blob/issue-1534/test/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_mount_007_pos.ksh#L125

This is basically:

#!/bin/bash                                                                                                                                                   

get_prop() {
        zfs get -pH -o value $1 $2
}

rm -f /var/tmp/zfs_test-1
truncate -s 25000m /var/tmp/zfs_test-1
zpool create test /var/tmp/zfs_test-1

zfs create test/test1
zfs set mountpoint=/var/tmp/test/0/1/2 test/test1

orig_val=$(get_prop atime test/test1)
zfs mount -o remount,noatime test/test1
cur_val=$(get_prop atime test/test1)

echo "Original value (atime): '$orig_val', Current value: '$cur_val'"

And the result is:

# /tmp/testme.sh 
Original value (atime): 'on', Current value: 'on'

But this is correct, it SHOULDN'T be changed, so the test if [[ $orig_val == $cur_val ]]; then should really be != or??

FransUrbo commented 10 years ago

I could also need some help taking a look at https://github.com/zfsonlinux/zfs-test/tree/issue-1534/test/zfs-tests/cmd/file_write.

I get:

Test: /opt/zfs-tests/tests/functional/cli_root/zpool_clear/zpool_clear_001_pos (run as root) [00:06] [FAIL]
18:16:34.66 ASSERTION: Verify 'zpool clear' can clear errors of a storage pool.
18:16:34.70 SUCCESS: /usr/src/zfs-test/scripts/mkfile -s 100m /var/tmp/testdir8049/file.0
18:16:34.74 SUCCESS: /usr/src/zfs-test/scripts/mkfile -s 100m /var/tmp/testdir8049/file.1
18:16:34.76 SUCCESS: /usr/src/zfs-test/scripts/mkfile -s 100m /var/tmp/testdir8049/file.2
18:16:34.77 NOTE: 'zpool clear' clears leaf-device error.
18:16:34.96 SUCCESS: /usr/sbin/zpool create -f testpool1.8049 mirror /var/tmp/testdir8049/file.0 /var/tmp/testdir8049/file.1 /var/tmp/testdir8049/file.2
18:16:35.02 SUCCESS: /usr/sbin/zfs create testpool1.8049/fs
18:16:35.02 DEBUG: cmd='/opt/zfs-tests/bin/file_write -o create -f /testpool1.8049/fs/f.0 -b 1048576 -c 40'
18:16:36.67 DEBUG: cmd='/opt/zfs-tests/bin/file_write -o create -f /testpool1.8049/fs/f.1 -b 1048576 -c 40'
18:16:40.41 /opt/zfs-tests/tests/functional/cli_root/zpool_clear/zpool_clear_001_pos.ksh[211]: do_testing: line 151: 1334: Memory fault
18:16:40.41 NOTE: Performing local cleanup via log_onexit (cleanup)
[....]

But unfortunately I have been unable to trigger this outside the test suite...

FransUrbo commented 10 years ago

Real progress! No tests ignored and a large part of them succeeds...

Results Summary
FAIL     113
PASS     589

Running Time:   01:23:54
Percent passed: 83.9%

Have to check how many of the fails is genuine though.

FransUrbo commented 10 years ago

Ok, I think this is as far as I'm going to take it...

Results Summary
FAIL      96
PASS     600

I'm now going to take step three - create a automake system...

behlendorf commented 10 years ago

Nice! 600 tests passing is a great addition to our test coverage. I think moving on to the autoconf bits is a good idea.

FransUrbo commented 10 years ago

After going through almost all the scripts, I see that there's room for improvements - some testing that isn't done... But I'll add that to the list of things to do (at the very end :).

FransUrbo commented 10 years ago

Ok, initial attempt to a automake system pushed. Since I don't have anything but Linux, I haven't tested it anywhere other than that so far...

./autogen.sh && ./configure && make test

!!! WARNING !!!

I still haven't gone through all scripts and seen if I could make sure they don't willy-nilly destroy any and all pools and filesystems it find. That is next on the list...

Also, the setup in README.md is still needed. I need to fix that as well...

NOTE: ZoL needs to be build in one of the usual locations (or specified with --with-zfs-obj=..... This because one of the test binaries needed libspl.a.

This probably needs to be added to one of the ZoL devel packages/installed together with the other .a and .la libraries...

FransUrbo commented 10 years ago

I think I'm mostly done now. At least ready for a peer review :).

The 'destroy everything found' part, the 'culprit' is default_cleanup_noexit() - https://github.com/zfsonlinux/zfs-test/blob/issue-1534/test/zfs-tests/include/libtest.shlib#L379.

I could rewrite that, but I'm not sure how. Currently, it recognizes the KEEP environment variable set/used in the main test script - https://github.com/zfsonlinux/zfs-test/blob/issue-1534/test/zfs-tests/cmd/scripts/zfstest.ksh#L131.

My solution was to simply set this in the Makefile - https://github.com/zfsonlinux/zfs-test/blob/issue-1534/Makefile.am#L28.

This limits the changes to the test suite to a minimum (my intention). If someone have a better idea, I'm all for it...

I also found a neat trick to make sure zfs unmount -a doesn't work on anything but the pool in question:

        export __ZFS_POOL_RESTRICT="$TESTPOOL"
        log_must $ZFS mount -a
        unset __ZFS_POOL_RESTRICT

He, that was cool! It seems to be working...

FransUrbo commented 10 years ago

Results Summary
FAIL      88
PASS     598

Running Time:   01:33:00
Percent passed: 87.2%

I apparently got rid of some FAIL just by compiling with blkid :). This is as far as I can take it at the moment though...

behlendorf commented 10 years ago

OK. I'll add it to my review list.

b333z commented 10 years ago

Great work getting this going @FransUrbo, I have been playing with this a bit, had to change a few bits to get working on my system that might be good to add to autoconf etc, what's best, want me to add as issues against zfs-test repo, or would you prefer comments here?

FransUrbo commented 10 years ago

@b333z If it's a lot, then an PR, but if it's just a few things here and there, comments here will be fine.

b333z commented 10 years ago

Sounds good @FransUrbo, A PR might be more productive, I'll try to clean up a bit so it's PR worthy.

behlendorf commented 10 years ago

I've started looking at this as well and run in to a few issues. Perhaps we should enable the issue tracker for the zfs-tests repository. Then we can move these issues over there.

FransUrbo commented 10 years ago

@behlendorf agreed.

@b333z did you manage to get a PR going?

FransUrbo commented 10 years ago

@b333z did you manage to get a PR going?

b333z commented 10 years ago

@FransUrbo I did not make as much progress on this as I'd wanted to, I found some of the tests that were modified to run against loop back devices seemed to make my test system hang ( extra level of nesting perhaps? Is there more to "this directory cannot be ran on raw files" than just not being able to partition them, not an issue on your system? ) so that caught me up for a bit, so I then messily hacked away to get it working against real disks again to see if the extra nesting was the cause, which did resolve my system hangs but have since been unable to form this into any sort of cleanup. I've left what I have running against the zfsonlinux repo for a few months and about 100 tests fail due to cleanup issues getting busy on destroy of test pool which I haven't looked into further...

Personally where I'd like to see this progress to next is providing the ability to be able to choose whether to run against disks or loopback devices when running on linux, I was sort of thinking along the lines of trying to abstract more stuff that was introduced in 28f3386c0 out to functions in libtest.shlib, having a single set of functions that hide away the details of device naming and creation/cleanup of partitions + ( the if [ -z $LINUX] stuff etc ) would allow easy switching between partitioning tools, cleanup and naming schemes. For example (from memory) I was thinking of more stuff like this: adding a get_partition_device <disk> <partition (1's based)> function could hide away the complexity of figuring out a device name from a disk and partition number, hiding away the issues with the slice/partition separater, and being 0 or 1 based etc.

Thanks again for your work on this has been great to have a large selections of tests to run and apologize if my lack of progress has held you up at all.

I did manage to hack zfs-test's test-runner to output results in junit.xml format and knocked up something to translate xfstests result to junit.xml format too if that's of use to anyone.

eses2014 commented 9 years ago

@FransUrbo: I am trying to install and run the ZFS test suite on CentOS box. I am very new to ZFS. What steps should I follow to install and run the test suite. Please forgive me, if this is not the right place to ask this question.

FransUrbo commented 9 years ago

@FransUrbo: I am trying to install and run the ZFS test suite on CentOS box. I am very new to ZFS. What steps should I follow to install and run the test suite. Please forgive me, if this is not the right place to ask this question.

If you don't know what you're doing, DO NOT INSTALL OR USE THIS!!!

It will, very (most) likely destroy your system if not careful!! This is, at the moment, ONLY (!!) for VERY experienced developers with in-depth knowledge of ZFS and ZoL.

If you insist, and your system and/or pool is destroyed without any trace and possibility to recover/retrieve, please do not come here and complain and/or create issues.

But what you're asking for is in README.md. It IS enough to get it to destroy your system, just be warned!=

thegreatgazoo commented 8 years ago

It seemed this PR added a new empty file tests/zfs-tests/tests/stress/Makefile.am. But make distclean will remove this file:

% git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean
% make distclean
% git status    
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        deleted:    tests/zfs-tests/tests/stress/Makefile.am

no changes added to commit (use "git add" and/or "git commit -a")

If this file is needed and should be under version control, then make distclean should not remove it.

behlendorf commented 8 years ago

@thegreatgazoo yup, you're right that's definitely a bug. If you already have a fix for this please open a PR.

openzfs / zfs

fully automated testsuite #1534