scylladb / scylla-manager

The Scylla Manager
https://manager.docs.scylladb.com/stable/
Other
48 stars 33 forks source link

Investigate Azure test failures in master branch #3864

Closed mikliapko closed 1 month ago

mikliapko commented 1 month ago

The Azure backup jobs starting from 433 to 436 have been constantly failing during the last 2 months. The problem should be investigated and fixed.

Relevant only for master branch.

Jenkins job - https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/sct-feature-test-backup-azure/

fruch commented 1 month ago
05:53:08  assert "not enough disk space" in full_progress_string.lower(), \
05:53:08  AssertionError: The restore failed as expected when one of the nodes was out of disk space, but with an ill fitting error message: Restore progress
05:53:08  Run:      0537bbde-f2f7-11ee-bc2c-7c1e5201b767
05:53:08  Status:       ERROR (disabling restored tables tombstone_gc)
05:53:08  Cause:        disable keyspace1.standard1 tombstone_gc: Request is aborted by a caller

Backup is failing from the wrong reason, I would guess something in scylla core changed (maybe enablement of raft topology by default?)

mikliapko commented 1 month ago

But there is one more problem in the latest runs (https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/sct-feature-test-backup-azure/436/):

13:07:13  sdcm.provision.provisioner.ProvisionError: (OperationNotAllowed) The specified disk size 20 GB is smaller than the size of the corresponding disk in the VM image: 30 GB. This is not allowed. Please choose equal or greater size or do not specify an explicit size.
13:07:13  Code: OperationNotAllowed
13:07:13  Message: The specified disk size 20 GB is smaller than the size of the corresponding disk in the VM image: 30 GB. This is not allowed. Please choose equal or greater size or do not specify an explicit size.
13:07:13  Target: osDisk.diskSizeGB

It's the error on attempt to create loader node.

I see that the default loader size was changed here recently. That's the reason of failure.

@soyacz @fruch I may fix it exclusively for Manager tests adjusting the configuration yaml if you don't have any better idea?

fruch commented 1 month ago

But there is one more problem in the latest runs (https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/sct-feature-test-backup-azure/436/):

13:07:13  sdcm.provision.provisioner.ProvisionError: (OperationNotAllowed) The specified disk size 20 GB is smaller than the size of the corresponding disk in the VM image: 30 GB. This is not allowed. Please choose equal or greater size or do not specify an explicit size.
13:07:13  Code: OperationNotAllowed
13:07:13  Message: The specified disk size 20 GB is smaller than the size of the corresponding disk in the VM image: 30 GB. This is not allowed. Please choose equal or greater size or do not specify an explicit size.
13:07:13  Target: osDisk.diskSizeGB

It's the error on attempt to create loader node.

I see that the default loader size was changed here recently. That's the reason of failure.

@soyacz @fruch I may fix it exclusively for Manager tests adjusting the configuration yaml if you don't have any better idea?

This issue was already fixed, you don't need to change anything for manager tests

mikliapko commented 1 month ago

But there is one more problem in the latest runs (https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/sct-feature-test-backup-azure/436/):

13:07:13  sdcm.provision.provisioner.ProvisionError: (OperationNotAllowed) The specified disk size 20 GB is smaller than the size of the corresponding disk in the VM image: 30 GB. This is not allowed. Please choose equal or greater size or do not specify an explicit size.
13:07:13  Code: OperationNotAllowed
13:07:13  Message: The specified disk size 20 GB is smaller than the size of the corresponding disk in the VM image: 30 GB. This is not allowed. Please choose equal or greater size or do not specify an explicit size.
13:07:13  Target: osDisk.diskSizeGB

It's the error on attempt to create loader node. I see that the default loader size was changed here recently. That's the reason of failure. @soyacz @fruch I may fix it exclusively for Manager tests adjusting the configuration yaml if you don't have any better idea?

This issue was already fixed, you don't need to change anything for manager tests

Great, then I'll try to rerun. The latest run in manager-3.2 branch was OK, and if disk size issue is resolved in master I suppose the job should pass.

https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/sct-feature-test-backup-azure/437/

mikliapko commented 1 month ago

The job has passed: https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/sct-feature-test-backup-azure/437/

So, closing the issue then.