veeam / blksnap

Nonpersistent block device snapshot with block-level change-tracking capabilities.
GNU General Public License v2.0
84 stars 22 forks source link

Blksnap commands in tests don't wait to complete before procede #34

Closed Fantu closed 1 year ago

Fantu commented 1 year ago
 * Blksnap version:
 saw in different versions with both module and upstream tests
 * Distribution: tried both debian unstable and ubuntu 20.04
 * Architecture: amd64

Describe the bug In tests seems that some blksnap commands is still running when procede with other commands of the tests script and causes unexpected issue, mainly when try to unload blksnap module. I tried some sleep as workaround (https://github.com/Fantu/blksnap/commit/ae8aa53f11568018d628ab0fc633dc1050c79122) but don't solve and anyway is better to avoid things like this. I mentioned initially here: https://github.com/veeam/blksnap/issues/23#issuecomment-1331256858

Steps to reproduce execute all.sh tests

Fantu commented 1 year ago

@SergeiShtepa I did some fast tests with blksnap_interface_v2 branch and kernel based on https://github.com/SergeiShtepa/linux/commit/4c610654701621156367dedaf9a431b674d6fc87 Thanks for the improvements that you did on tests, trying them stop at end of first because fail module unload, it result still in use even try to unload manually after some minutes. Looking logs related to the test I didn't see errors, can be an unexpected case in kernel module already solved in latest commits? (I launched kernel build used before them) if not is there something that can help to found the cause that I should do?

EDIT: build kernel based on latest commit https://github.com/SergeiShtepa/linux/commit/6e21eb94f26d8351ece319f49fcb24cd8283e818 and also rebase on latest upstream https://github.com/torvalds/linux/commit/0136d86b78522bbd5755f8194c97a987f0586ba5 and issue is still present, after any use is impossible unload blksnap module

Fantu commented 1 year ago

@SergeiShtepa did another kernel build rebased on uploaded upstream and with blksnap as builtin. I did some fast tests, as builtin (without module unload that fails because in use) and was completed. Anyway there are some of these:

Failed to get event from snapshot.: No such process
Stretch snapshot service failed.

I executed the tests logging the output to file and print also on terminal using: sudo ./all.sh 2>&1 | tee -a /tmp/blksnap_test_$(date -u '+%Y-%m-%d_%H-%M-%S').log thanks to your improvements now seems saved to file all the output, here one full log if can be useful: https://paste.debian.net/1270288/

I also tested launching with only sudo ./all.sh and still have errors related to "stretch_snapshot" command even if not always in same place so I suppose is not caused by output redirect of the command, here log of another tests without output redirect: https://paste.debian.net/1270290/

SergeiShtepa commented 1 year ago

Thank you, Fabio. I am sure I will get to this problem in the test.

SergeiShtepa commented 1 year ago

Fixed in blksnap_interface_v2 Commit Not merged into master yet.

Fantu commented 1 year ago

@SergeiShtepa thanks I did a fast test, was failed for another error but I don't know if unexpected case because I didn't rebuild kernel to latest commit (the kernel build used is up to commit "fix - deleted forgotten down_killable()") or there is a bug. Here the log if you want take a look: https://paste.debian.net/hidden/2f5692a7/ If updated kernel build is needed ignore it, I'll redo with updated kernel the error report no space left on device but the filesystem was not full, seems related to diff storage? small question how much free space on disk is required for these tests?

Fantu commented 1 year ago

rebuild kernel updated and tests completed, now with blksnap as module I saw that unload still fail as in use but https://github.com/veeam/blksnap/commit/395989a0f32905ec8a4ddc8d9c8596da6e4a39d0 seems workaround it here the tests log if can be useful: https://paste.debian.net/hidden/8dede2c5/

EDIT: sorry I didn't look good, still failed on module unload and don't execute the latest tests

SergeiShtepa commented 1 year ago

Yes. In "Diff storage test"

Destroy snapshot
Destroy snapshot 26691c0b-6385-4dae-abee-c992219cc04e
Waiting for streach process terminate
Unload module
modprobe: FATAL: Module blksnap is in use.
modprobe: FATAL: Module blksnap is in use.
The snapshot no longer exists.
Stretch snapshot service finished.

I'll check it out. Thanks.

Fantu commented 1 year ago

@SergeiShtepa good, after https://github.com/veeam/blksnap/commit/5b410f175cd96c07fe140de12538e6bd2c0d2987 tests issue seems solved, I'll wait to do some other tests to be sure before close as solved

Fantu commented 1 year ago

also latest tests are ok so this can be considered solved in v2 branch