ubuntu / zsys

ZSys daemon and client for zfs systems
GNU General Public License v3.0
302 stars 43 forks source link

enable system zfs tests #250

Closed xnox closed 2 months ago

xnox commented 1 year ago

@didrocks have system tests not been running for a very long time, thus it is not possible to tell why there are regressions inside them? Also it seems like autopkgtest is also not running the system tests, meaning it sort of is pointless too? I cannot tell if the failrues in the system tests are real, missmatch of expectations, or real regressions that have cropped up over time (i.e. clone not working right?)

I would appreciate a comment from you on the above before I proceed further.

didrocks commented 1 year ago

The system tests were running, indeed, on a very old distro because of lack of maintainance capacity of this project: https://github.com/ubuntu/zsys/actions/runs/5134228853/jobs/9237922322 pass for instance.

So, in terms of code, it means it’s still working as expected with this. Also, the tests with mocks are all passing too. So, it means that if using a newer distro fails, there is a behaviour change between a past ZFS in one distro and the current new one. This is really the goal of those tests: they are rerunning the exact same test suite than with the mock itself, but with the real zfs on system. So, they are valuable and probably shows that something changed on the ZFS side.

The reason why the zfs system tests are not running in autopkgtests was (still is?) the uncapability of loading the zfs kernel module on the autopkgtests VMs at the time, making it impossible to run with real zfs datasets. If this limitation is dropped, it’s completely worth running them as autopkgtests as it will help with any zfs potential behaviour change.

Does it make sense?

xnox commented 1 year ago

The system tests were running, indeed, on a very old distro because of lack of maintainance capacity of this project: https://github.com/ubuntu/zsys/actions/runs/5134228853/jobs/9237922322 pass for instance.

True. Horum.

So, in terms of code, it means it’s still working as expected with this. Also, the tests with mocks are all passing too. So, it means that if using a newer distro fails, there is a behaviour change between a past ZFS in one distro and the current new one. This is really the goal of those tests: they are rerunning the exact same test suite than with the mock itself, but with the real zfs on system. So, they are valuable and probably shows that something changed on the ZFS side.

Ack.

The reason why the zfs system tests are not running in autopkgtests was (still is?) the uncapability of loading the zfs kernel module on the autopkgtests VMs at the time, making it impossible to run with real zfs datasets. If this limitation is dropped, it’s completely worth running them as autopkgtests as it will help with any zfs potential behaviour change.

All our kernels have zfs built-in, so zfs has always been available in autopkgtest VMs. Also we verify that dkms built zfs is possible to rebuild and load up and we run smoketesting against upgraded zfs from proposed. So it is possible. See https://autopkgtest.ubuntu.com/packages/zfs-linux (note the lunar fails, there is SRU in proposed that is fixing things up)

I.e. the reason why github runners have zfs loaded up, is because github runners are using ubuntu - same as autopkgtests.

Does it make sense?

Thanks will dig deeper.

xnox commented 1 year ago

The system tests were running, indeed, on a very old distro because of lack of maintainance capacity of this project: https://github.com/ubuntu/zsys/actions/runs/5134228853/jobs/9237922322 pass for instance.

True. Horum.

That job run is for the merge commit, and the CI run on the PR itself before the merge commit did fail at https://github.com/ubuntu/zsys/actions/runs/5133567596/jobs/9237612045?pr=248 the system tests having FAIL:. I wonder if it is racy or something =/

xnox commented 1 year ago

Test runs on kernel 1037-azure seem to be bad, and those with 1038-azure look good. Investigating. I wonder, if we should have a weekly cron job running to ensure changes to Ubuntu are not regressing zsys.

xnox commented 1 year ago

It seems very racy, as on every rerun with 1038 kernel I get different tests failing.