harvester / tests

Harvester test cases
Apache License 2.0
10 stars 31 forks source link

[e2e][backend] Add test cases for restore backup with snapshot created #1384

Open TachunLin opened 2 months ago

TachunLin commented 2 months ago

Which issue(s) this PR fixes:

Issue #1045

What this PR does / why we need it:

According to issue https://github.com/harvester/harvester/issues/4954

We need to add the backend e2e test to the vm_backup_restore integration test
To cover the case when a vm both have backup and snapshot created on it, when we restore this vm from backup. It should be restore successfully.

Added the following test script:

  1. test_with_snapshot_restore_with_new_vm
    • Restore vm also have snapshot created to a new vm
  2. test_with_snapshot_restore_replace_retain_vols
    • Restore vm also have snapshot created to replace the existing vm and retain volume

Special notes for your reviewer:

Test result (Trigger locally and execute test on remote ecm lab machine)

  1. Test can PASS all test cases in the TestBackupRestoreWithSnapshot class
    image

  2. Test can PASS most of the test cases in the test_4_vm_backup_restore.py file image

albinsun commented 2 months ago

Have a quick try using ECM raven(v1.2.2) but fails on:

  1. test_with_snapshot_restore_with_new_vm[NFS]
  2. test_with_snapshot_restore_replace_retain_vols[NFS]

Please help to check, thx.

harvester-runtests/43 image

bk201 commented 2 months ago

@TachunLin Please help check the error and make sure the test run successfully in Jenkins.

TachunLin commented 2 months ago

Thanks for the reminder. I checked the test report of harvester-runtests/43. Most of the S3 related test was failed from test_connection[S3]::setup

The reason is we not yet have the backup bucket created in our ecm lab minio artifact endpoint I have created two more buckets ravens and falcons for future testing requirement on these machines.

Then I set the same config.yml which used by the harvester_run_test pipeline from my local to trigger the same test to remote ravens cluster.

The result is when I execute TestBackupRestoreWithSnapshot or the entire test_4_vm_backup_restore.py Both of them can pass most of the test cases.

Next I trigger a new test harvester-runtests/49, there I get some expected failure like the following:

I would continue to investigate what cause these tests failed on the Jenkins run test jobs while works fine when trigger from local.

albinsun commented 2 months ago

...

I would continue to investigate what cause these tests failed on the Jenkins run test jobs while works fine when trigger from local.

Hi @TachunLin , I checked our daily tests and found 2 basic restore test cases are failed too.

  1. _TestBackupRestore::test_restore_with_newvm[S3,NFS]
  2. _TestBackupRestore::test_restore_replace_with_deletevols[S3,NFS]

If they are PASS if trigger from local like you mentioned, then we should suspect it's an environment issue and investigate ECM lab.

CC @lanfon72

harvester-install-and-test-e2e-daily#328 (v1.3-head) image

harvester-install-and-test-e2e-daily#329 (v1.2-head) image

lanfon72 commented 2 months ago

... I would continue to investigate what cause these tests failed on the Jenkins run test jobs while works fine when trigger from local.

Hi @TachunLin , I checked our daily tests and found 2 basic restore test cases are failed too.

  1. _TestBackupRestore::test_restore_with_newvm[S3,NFS]
  2. _TestBackupRestore::test_restore_replace_with_deletevols[S3,NFS]

If they are PASS if trigger from local like you mentioned, then we should suspect it's an environment issue and investigate ECM lab.

CC @lanfon72

harvester-install-and-test-e2e-daily#328 (v1.3-head) image

harvester-install-and-test-e2e-daily#329 (v1.2-head) image

Those test cases are known issue in https://github.com/harvester/harvester/issues/4640 in KVM environment, they might not always reproduced.

albinsun commented 2 months ago

Hi @TachunLin, Per discussed, I checked the flaky test case test_restore_with_new_vm and found it's due to VM will not created immediately after triggering api_client.backups.restore, so vm_checker.wait_ip_addresses assert VM is not there and fail the test.

Just sent a quick fix PR, please refer to

TachunLin commented 2 months ago

Thank you @albinsun for finding the root cause and created PR https://github.com/harvester/tests/pull/1419 to fix the flaky case of test_restore_with_new_vm. I am really appreciated for your help.

I also added the vm_checker.wait_getable function into the test_with_snapshot_restore_with_new_vm and test_with_snapshot_restore_replace_retain_vols.

And trigger the new test on the main Jenkins vm towards the raven cluster to run TestBackupRestoreWithSnapshot class only.

It can well fix the test_with_snapshot_restore_with_new_vm test cases. image

And I also trigger the entire test_4_vm_backup_restore, it can also PASS most of the back and restore cases except those flaky cases failed before for other reason.

image

TachunLin commented 1 month ago

Thanks for the check and suggestion. In the begging to add restore with backup related test cases, I also have the same plan to add all test cases under the existing TestBackupRestore class.

But when I actually add the new test_with_snapshot_restore_with_new_vm and test_with_snapshot_restore_replace_retain_vols to the end of TestBackupRestore class.

After trigger the test, we can see while execution on the two new test cases. The existing virtual machine can't do any restore vm action to the new or replace the existing.

I think the reason to cause this failure may related to in the entire TestBackupRestore class, we use the same and the only one virtual machines. And since the virtual machine have already done with the restore to new vm and replace with existing vm in the previous tests test_restore_with_new_vm and test_restore_replace_with_delete_vols.

Thus I plan to create a separate class TestBackupRestoreWithSnapshot to make the restore with snapshot related tests can run on a separate and clean vm without affecting all the existing test cases.

Indeed it will increase the execution time to rerun the test connection and take backup test.

In a trade off, The pros to have a separate class can have the following benefits:

  1. Make the new restore backup with snapshot cases can running well
  2. Maintain the stability of the backup and restore test without affecting with existing tests
  3. For future scalability, in case of we need to add more restore backup with restore related test in the future

According to this, may I continue to use the separate class TestBackupRestoreWithSnapshot for the new restore backup with snapshot test cases. Many thanks.