Open TachunLin opened 2 months ago
Have a quick try using ECM raven(v1.2.2
) but fails on:
test_with_snapshot_restore_with_new_vm[NFS]
test_with_snapshot_restore_replace_retain_vols[NFS]
Please help to check, thx.
harvester-runtests/43
@TachunLin Please help check the error and make sure the test run successfully in Jenkins.
Thanks for the reminder. I checked the test report of harvester-runtests/43
.
Most of the S3 related test was failed from test_connection[S3]::setup
The reason is we not yet have the backup bucket created in our ecm lab minio artifact endpoint
I have created two more buckets ravens
and falcons
for future testing requirement on these machines.
Then I set the same config.yml
which used by the harvester_run_test
pipeline from my local to trigger the same test to remote ravens
cluster.
The result is when I execute TestBackupRestoreWithSnapshot
or the entire test_4_vm_backup_restore.py
Both of them can pass most of the test cases.
The TestBackupRestoreWithSnapshot
class
The test_4_vm_backup_restore
file
Next I trigger a new test harvester-runtests/49
, there I get some expected failure like the following:
TestBackupRestore::test_restore_with_new_vm[S3] and [NFS]
, TestBackupRestoreWithSnapshot::test_with_snapshot_restore_with_new_vm[S3] and [NFS]
E AssertionError: Failed to Start VM(s3-restore-0735071682-09h31m16s956301-07-24) with errors:
E Status: 404
E API Status(404): {'type': 'error', 'links': {}, 'code': 'NotFound', 'message': 'virtualmachines.kubevirt.io "s3-restore-0735071682-09h31m16s956301-07-24" not found', 'status': 404}
E assert False
test_restore_replace_with_delete_vols[S3] and [NFS]
, TestBackupRestoreWithSnapshot::test_with_snapshot_restore_replace_retain_vols[S3] and [NFS]
E AssertionError: cloud-init writefile failed
E Executed stdout: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC2J8e7zDDo/Mfgg4cmvt4OJYXOuY+LMfNnl6lQzdVhXJTNnnf2ulA+GMnqDsw2o5QCZ/bYkfXIvhnIHYh9PChucUujFMKhz2F3+q8fXQZqt+p6koAj7toMdmpd66rS8+x9Krmk7rS/0iZn13jqyjSIIsZ0/5fEM13jpVpWIUFC2w==
E
E Executed stderr:
E assert '0708196929-09h31m16s956301-07-24' in 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC2J8e7zDDo/Mfgg4cmvt4OJYXOuY+LMfNnl6lQzdVhXJTNnnf2ulA+GMnqDsw2o5QCZ/bYkfXIvhnIHYh9PChucUujFMKhz2F3+q8fXQZqt+p6koAj7toMdmpd66rS8+x9Krmk7rS/0iZn13jqyjSIIsZ0/5fEM13jpVpWIUFC2w==\n'
I would continue to investigate what cause these tests failed on the Jenkins run test jobs while works fine when trigger from local.
...
I would continue to investigate what cause these tests failed on the Jenkins run test jobs while works fine when trigger from local.
Hi @TachunLin , I checked our daily tests and found 2 basic restore test cases are failed too.
If they are PASS if trigger from local like you mentioned, then we should suspect it's an environment issue and investigate ECM lab.
CC @lanfon72
harvester-install-and-test-e2e-daily#328
(v1.3-head
)
harvester-install-and-test-e2e-daily#329
(v1.2-head
)
... I would continue to investigate what cause these tests failed on the Jenkins run test jobs while works fine when trigger from local.
Hi @TachunLin , I checked our daily tests and found 2 basic restore test cases are failed too.
- _TestBackupRestore::test_restore_with_newvm[S3,NFS]
- _TestBackupRestore::test_restore_replace_with_deletevols[S3,NFS]
If they are PASS if trigger from local like you mentioned, then we should suspect it's an environment issue and investigate ECM lab.
CC @lanfon72
harvester-install-and-test-e2e-daily#328
(v1.3-head
)
harvester-install-and-test-e2e-daily#329
(v1.2-head
)
Those test cases are known issue in https://github.com/harvester/harvester/issues/4640 in KVM environment, they might not always reproduced.
Hi @TachunLin,
Per discussed, I checked the flaky test case test_restore_with_new_vm
and found it's due to VM will not created immediately after triggering api_client.backups.restore
, so vm_checker.wait_ip_addresses
assert VM is not there and fail the test.
Just sent a quick fix PR, please refer to
Thank you @albinsun for finding the root cause and created PR https://github.com/harvester/tests/pull/1419 to fix the flaky case of test_restore_with_new_vm
. I am really appreciated for your help.
I also added the vm_checker.wait_getable
function into the test_with_snapshot_restore_with_new_vm
and test_with_snapshot_restore_replace_retain_vols
.
And trigger the new test on the main Jenkins vm towards the raven
cluster to run TestBackupRestoreWithSnapshot
class only.
It can well fix the test_with_snapshot_restore_with_new_vm
test cases.
And I also trigger the entire test_4_vm_backup_restore
, it can also PASS most of the back and restore cases except those flaky cases failed before for other reason.
Thanks for the check and suggestion.
In the begging to add restore with backup related test cases, I also have the same plan to add all test cases under the existing TestBackupRestore
class.
But when I actually add the new test_with_snapshot_restore_with_new_vm
and test_with_snapshot_restore_replace_retain_vols
to the end of TestBackupRestore
class.
After trigger the test, we can see while execution on the two new test cases. The existing virtual machine can't do any restore vm action to the new or replace the existing.
Execution on the test_with_snapshot_restore_with_new_vm
https://github.com/user-attachments/assets/4d5e7760-66b0-432e-ba0e-96b76250a582
Execution on the test_with_snapshot_restore_with_new_vm
https://github.com/user-attachments/assets/839a32e9-5550-4dec-8059-1bed965e39e3
On the test report, we can find the two new test cases failed
Both of them got the connection failed
error
I think the reason to cause this failure may related to in the entire TestBackupRestore
class, we use the same and the only one virtual machines.
And since the virtual machine have already done with the restore to new vm and replace with existing vm in the previous tests test_restore_with_new_vm
and test_restore_replace_with_delete_vols
.
Thus I plan to create a separate class TestBackupRestoreWithSnapshot
to make the restore with snapshot related tests can run on a separate and clean vm without affecting all the existing test cases.
Indeed it will increase the execution time to rerun the test connection and take backup test.
In a trade off, The pros to have a separate class can have the following benefits:
According to this, may I continue to use the separate class TestBackupRestoreWithSnapshot
for the new restore backup with snapshot test cases. Many thanks.
Which issue(s) this PR fixes:
Issue #1045
What this PR does / why we need it:
According to issue https://github.com/harvester/harvester/issues/4954
We need to add the backend e2e test to the
vm_backup_restore
integration testTo cover the case when a vm both have backup and snapshot created on it, when we restore this vm from backup. It should be restore successfully.
Added the following test script:
Special notes for your reviewer:
Test result (Trigger locally and execute test on remote ecm lab machine)
Test can PASS all test cases in the
TestBackupRestoreWithSnapshot
classTest can PASS most of the test cases in the
test_4_vm_backup_restore.py
fileTestBackupRestore
andTestBackupRestoreOnMigration
andTestMultipleBackupRestore
, consider the stability and future test scalability, thus I create a separate class for the new test scenario.