scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
53 stars 88 forks source link

[Azure] mgmt_backup nemesis fails #4698

Closed soyacz closed 1 month ago

soyacz commented 2 years ago

Test details

System under test: Scylla version: 5.1.dev-0.20220504.b26a3da584cc with build-id ab2a33a30756c1513f4c516cd272291e75acec0e (-) Instance type: Standard_L8s_v2 Number of scylladb nodes: 6

Restore commands:

Restore Monitor Stack command: $ hydra investigate show-monitor 835fbc85-2bdf-46aa-a87d-04348bbbc1f8 Restore monitor on AWS instance using Jenkins job Show all stored logs command: $ hydra investigate show-logs 835fbc85-2bdf-46aa-a87d-04348bbbc1f8

Logs:

grafana - https://cloudius-jenkins-test.s3.amazonaws.com/835fbc85-2bdf-46aa-a87d-04348bbbc1f8/20220504_120032/grafana-screenshot-overview-20220504_120033-longevity-10gb-3h-master-monitor-node-835fbc85-eastus-1.png](https://cloudius-jenkins-test.s3.amazonaws.com/835fbc85-2bdf-46aa-a87d-04348bbbc1f8/20220504_120032/grafana-screenshot-overview-20220504_120033-longevity-10gb-3h-master-monitor-node-835fbc85-eastus-1.png%5D(https://cloudius-jenkins-test.s3.amazonaws.com/835fbc85-2bdf-46aa-a87d-04348bbbc1f8/20220504_120032/grafana-screenshot-overview-20220504_120033-longevity-10gb-3h-master-monitor-node-835fbc85-eastus-1.png)) db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/835fbc85-2bdf-46aa-a87d-04348bbbc1f8/20220504_120807/db-cluster-835fbc85.tar.gz loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/835fbc85-2bdf-46aa-a87d-04348bbbc1f8/20220504_120807/loader-set-835fbc85.tar.gz sct - https://cloudius-jenkins-test.s3.amazonaws.com/835fbc85-2bdf-46aa-a87d-04348bbbc1f8/20220504_120807/sct-runner-835fbc85.tar.gz

Links:

Build URL Download "Overview metrics" Grafana Screenshot Download Parallel Timelines report

Description

When using Azure backend sctool backup does not work due some missing credentials. Need to investigate and fix it.

Command: 'sudo sctool backup -c 87e71188-0bfd-4112-ba46-fca93f3b2d59 --keyspace keyspace_new_dc,keyspace1  --location s3:manager-backup-tests-us-east-1 '

Exit code: 1

Stdout:

Stderr:

 10.0.0.5: agent [HTTP 404] no put permission: s3 upload: sign request: NoCredentialProviders: no valid providers in chain. Deprecated.
    For verbose messaging see aws.Config.CredentialsChainVerboseErrors - make sure the location is correct and credentials are set, to debug SSH to 10.0.0.5 and run "scylla-manager-agent check-location -L s3:manager-backup-tests-us-east-1 --debug"
 10.0.0.9: agent [HTTP 404] no put permission: s3 upload: sign request: NoCredentialProviders: no valid providers in chain. Deprecated.
    For verbose messaging see aws.Config.CredentialsChainVerboseErrors - make sure the location is correct and credentials are set, to debug SSH to 10.0.0.9 and run "scylla-manager-agent check-location -L s3:manager-backup-tests-us-east-1 --debug"
 10.0.0.7: agent [HTTP 404] no put permission: s3 upload: sign request: NoCredentialProviders: no valid providers in chain. Deprecated.
    For verbose messaging see aws.Config.CredentialsChainVerboseErrors - make sure the location is correct and credentials are set, to debug SSH to 10.0.0.7 and run "scylla-manager-agent check-location -L s3:manager-backup-tests-us-east-1 --debug"
 10.0.0.14: agent [HTTP 404] no put permission: s3 upload: sign request: NoCredentialProviders: no valid providers in chain. Deprecated.
    For verbose messaging see aws.Config.CredentialsChainVerboseErrors - make sure the location is correct and credentials are set, to debug SSH to 10.0.0.14 and run "scylla-manager-agent check-location -L s3:manager-backup-tests-us-east-1 --debug"
Trace ID: oNs6FNp6RoKdBkMCWTCMRA (grep in scylla-manager logs)

Traceback (most recent call last):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 3337, in wrapper
    result = method(*args, **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2056, in disrupt_mgmt_backup_specific_keyspaces
    self._mgmt_backup(backup_specific_tables=True)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2082, in _mgmt_backup
    mgr_task = mgr_cluster.create_backup_task(location_list=[location, ], keyspace_list=non_test_keyspaces)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/mgmt/cli.py", line 537, in create_backup_task
    res = self.sctool.run(cmd=cmd, parse_table_res=False)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/mgmt/cli.py", line 1057, in run
    res = self.manager_node.remoter.sudo(f"sctool {cmd}")
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/base.py", line 123, in sudo
    return self.run(cmd=cmd,
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 613, in run
    result = _run()
  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 64, in inner
    return func(*args, **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 604, in _run
    return self._run_execute(cmd, timeout, ignore_status, verbose, new_session, watchers)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 537, in _run_execute
    result = connection.run(**command_kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 620, in run
    return self._complete_run(channel, exception, timeout_reached, timeout, result, warn, stdout, stderr)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 654, in _complete_run
    raise UnexpectedExit(result)
sdcm.remote.libssh2_client.exceptions.UnexpectedExit: Encountered a bad command exit code!
fgelcer commented 2 years ago

this is expected, as the access to the S3 bucket is protected under a profile, that cannot be assigned to any instances that are not EC2...

we must create a bucket in Azure, and have a selector to the bucket to backup on different backends. @ShlomiBalalis FYI

soyacz commented 2 years ago

Thanks @fgelcer for clarification. Anyway we need to tackle this problem to make this nemesis work properly. We need a task and a plan for it.

fgelcer commented 2 years ago

Thanks @fgelcer for clarification. Anyway we need to tackle this problem to make this nemesis work properly. We need a task and a plan for it.

@rayakurl , could you please add a task for it?

fgelcer commented 2 years ago

@soyacz , once we have an answer from @rayakurl , and if we understand it will take time to be done, we can add to the nemesis a skip, if backend is azure until it is implemented. WDYT?

soyacz commented 2 years ago

I think it's good idea. We can do it with a comment to this issue / task targeting this problem. And task should include information to remove the condition.

soyacz commented 2 years ago

@rayakurl is fix for it planned?

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 2 years with no activity. Remove stale label or comment or this will be closed in 2 days.

github-actions[bot] commented 1 month ago

This issue was closed because it has been stalled for 2 days with no activity.