scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
57 stars 94 forks source link

configuring spot_low_price and multi-dc in yaml fails with: "The subnet ID 'subnet-5207ee37' does not exist" #671

Closed yarongilor closed 5 years ago

yarongilor commented 6 years ago

scenario: 1) yaml file has:

instance_provision: 'spot_low_price'
...
backends: !mux
    aws: !mux
...
        us_east_1_and_us_west_2:
            subnet_id: 'subnet-ec4a72c4 subnet-5207ee37'

2) run sct test setup ==> result: setup fails with:

2018-10-24 09:46:07,007 cluster_aws      L0180 DEBUG| Cluster yaron-2-3-0-multidc-db-cluster-bfdb858d (AMI: ['ami-0fc423ac17a75570d', 'ami-0da599147b1d9e80d'] Type: i3.large): Passing user_data '--clustername yaron-2-3-0-multidc-db-cluster-bfdb858d --totalnodes 2 --stop-services --seeds 54.91.205.42  --bootstrap false ' to create_instances
2018-10-24 09:46:07,699 tester           L0130 ERROR| Exception in init_resources. Will clean resources
Traceback (most recent call last):
  File "/sct/sdcm/tester.py", line 128, in wrapper
    return method(*args, **kwargs)
  File "/sct/sdcm/tester.py", line 623, in init_resources
    monitor_info=monitor_info)
  File "/sct/sdcm/tester.py", line 454, in get_cluster_aws
    self.db_cluster = create_cluster(db_type)
  File "/sct/sdcm/tester.py", line 445, in create_cluster
    **cl_params)
  File "/sct/sdcm/cluster_aws.py", line 377, in __init__
    params=params)
  File "/sct/sdcm/cluster.py", line 1404, in __init__
    super(BaseScyllaCluster, self).__init__(*args, **kwargs)
  File "/sct/sdcm/cluster_aws.py", line 99, in __init__
    region_names=self.region_names)
  File "/sct/sdcm/cluster.py", line 1223, in __init__
    self.add_nodes(num, dc_idx=dc_idx)
  File "/sct/sdcm/cluster_aws.py", line 406, in add_nodes
    enable_auto_bootstrap=enable_auto_bootstrap)
  File "/sct/sdcm/cluster_aws.py", line 217, in add_nodes
    instances = self._create_instances(count, ec2_user_data, dc_idx)
  File "/sct/sdcm/cluster_aws.py", line 189, in _create_instances
    instances = self._create_spot_instances(count, interfaces, ec2_user_data, dc_idx)
  File "/sct/sdcm/cluster_aws.py", line 131, in _create_spot_instances
    subnet_info = ec2.get_subnet_info(self._ec2_subnet_id[dc_idx])
  File "/sct/sdcm/ec2_client.py", line 306, in get_subnet_info
    resp = self._client.describe_subnets(SubnetIds=[subnet_id])
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 314, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 612, in _make_api_call
    raise error_class(parsed_response, operation_name)
ClientError: An error occurred (InvalidSubnetID.NotFound) when calling the DescribeSubnets operation: The subnet ID 'subnet-5207ee37' does not exist
amoskong commented 6 years ago

https://us-west-2.console.aws.amazon.com/vpc/home?region=us-west-2#subnets:search=subnet-5207ee37;sort=SubnetId

I can find the subnet by above link.

What's the region name in your sct config? Expect: region_name: 'us-east-1 us-west-2'

amoskong commented 6 years ago

Can you provide your full sct config?

bentsi commented 6 years ago

@amoskong I do see the subnet: https://us-west-2.console.aws.amazon.com/vpc/home?region=us-west-2#subnets:search=subnet-5207ee37;sort=SubnetId

bentsi commented 6 years ago

@amoskong I checked it before, his code works with on demand instances but doesn't with spot_low_price. I suspect the issue is in the spot handling logic

yarongilor commented 6 years ago

the testing yaml is:

#test_duration: 10080
test_duration: 500
stress_cmd: "cassandra-stress write cl=QUORUM duration=1m -schema 'replication(strategy=NetworkTopologyStrategy,us-eastscylla_node_east=1,us-west-2scylla_node_west=1)' -port jmx=6868 -mode cql3 native -rate threads=100 -pop seq=1..10000"
cassandra_stress_duration: 10080
cassandra_stress_threads: 100
cassandra_stress_population_size: 10000
#n_db_nodes: 3
n_db_nodes: '1 1' #'0 0' # '1 1'
n_loaders: 1 #1
n_monitor_nodes: 1
nemesis_class_name: 'MgmtCli'
#nemesis_class_name: 'MdcChaosMonkey'
#nemesis_class_name: 'DrainerMonkey'
nemesis_interval: 5
user_prefix: 'yaron-2-3-0-multidc'
failure_post_behavior: keep
space_node_threshold: 6442
ip_ssh_connections: 'public'

ami_id_db_scylla_desc: '2-3-0'

use_mgmt: true
mgmt_port: 10090
#scylla_repo_m: 'http://repositories.scylladb.com/scylla/repo/f4a2920f80c4bf178217c2553ad65ad7/centos/scylladb-2018.1.repo'
#scylla_repo_m: 'http://repositories.scylladb.com/scylla/repo/7b02fff5-e4d0-4e4d-ad12-e605ca4873c2/centos/scylladb-2018.1.repo'
scylla_repo_m: 'http://repositories.scylladb.com/scylla/repo/7b02fff5-e4d0-4e4d-ad12-e605ca4873c2/centos/scylladb-2018.1.repo'

scylla_mgmt_repo: 'http://downloads.scylladb.com.s3.amazonaws.com/manager/rpm/unstable/centos/branch-1.2/44/scylla-manager.repo'
#scylla_mgmt_repo: 'http://downloads.scylladb.com.s3.amazonaws.com/manager/rpm/unstable/centos/branch-1.2/latest/scylla-manager.repo'

#scylla_mgmt_repo: 'MANAGER_REPO_URL'

#es_url:
#es_user:
#es_password:
#instance_provision: 'spot_low_price'

backends: !mux
    aws: !mux
        # What is the backend that the suite will use to get machines from.
        cluster_backend: 'aws'
        # From 0.19 on, iotune will require bigger disk, so let's use a big
        # loader instance by default.
        instance_type_loader: 'c4.large'
        # Size of AWS monitor instance
        instance_type_monitor: i3.large
        us_east_1_and_us_west_2:
            user_credentials_path: '~/.ssh/scylla-qa-ec2'
            region_name: 'us-east-1 us-west-2'
            security_group_ids: 'sg-c5e1f7a0 sg-81703ae4'
            subnet_id: 'subnet-ec4a72c4 subnet-5207ee37'
            ami_id_db_scylla: 'ami-0fc423ac17a75570d ami-0da599147b1d9e80d'
            ami_db_scylla_user: 'centos'
            ami_id_loader: 'ami-0fc423ac17a75570d'
            ami_loader_user: 'centos'
            ami_id_monitor: 'ami-010f2b2749b78a6c5'
            ami_monitor_user: 'centos'
    gce: !mux
        cluster_backend: 'gce'
        user_credentials_path: '~/.ssh/scylla-test'
        gce_user_credentials: '~/Scylla-c41b78923a54.json'
        gce_service_account_email: 'skilled-adapter-452@appspot.gserviceaccount.com'
        gce_project: 'skilled-adapter-452'
        gce_image: 'https://www.googleapis.com/compute/v1/projects/centos-cloud/global/images/family/centos-7'
        gce_image_username: 'scylla-test'
        gce_instance_type_db: 'n1-highmem-8'
        gce_root_disk_type_db: 'pd-ssd'
        gce_root_disk_size_db: 50
        gce_n_local_ssd_disk_db: 1
        gce_instance_type_loader: 'n1-highcpu-4'
        gce_root_disk_type_loader: 'pd-standard'
        gce_root_disk_size_loader: 50
        gce_n_local_ssd_disk_loader: 0
        gce_instance_type_monitor: 'n1-standard-2'
        gce_root_disk_type_monitor: 'pd-standard'
        gce_root_disk_size_monitor: 50
        gce_n_local_ssd_disk_monitor: 0
        scylla_repo: https://s3.amazonaws.com/downloads.scylladb.com/rpm/unstable/centos/branch-1.7/37/scylla.repo
        #us_east_1:
        #  gce_datacenter: 'us-east1-b'
        multi_dcs:
          gce_datacenter: 'us-east1-b us-west1-b us-east4-b'

databases: !mux
    scylla:
        db_type: scylla
        instance_type_db: 'i3.large'
yarongilor commented 5 years ago

hi @amoskong, can you please advice - it gets the same failure, testing now on master branch:

Cluster yaron-2-3-0-multidc-db-cluster-5a2ada49 (AMI: ['ami-0fc423ac17a75570d', 'ami-0da599147b1d9e80d'] Type: i3.large): Passing user_data '--clustername yaron-2-3-0-multidc-db-cluster-5a2ada49 --totalnodes 2 --stop-services --seeds 54.172.191.59  --bootstrap false ' to create_instances
Exception in init_resources. Will clean resources
Traceback (most recent call last):
  File "/sct/sdcm/tester.py", line 129, in wrapper
    return method(*args, **kwargs)
  File "/sct/sdcm/tester.py", line 636, in init_resources
    monitor_info=monitor_info)
  File "/sct/sdcm/tester.py", line 467, in get_cluster_aws
    self.db_cluster = create_cluster(db_type)
  File "/sct/sdcm/tester.py", line 458, in create_cluster
    **cl_params)
  File "/sct/sdcm/cluster_aws.py", line 405, in __init__
    params=params)
  File "/sct/sdcm/cluster.py", line 1451, in __init__
    super(BaseScyllaCluster, self).__init__(*args, **kwargs)
  File "/sct/sdcm/cluster_aws.py", line 102, in __init__
    region_names=self.region_names)
  File "/sct/sdcm/cluster.py", line 1270, in __init__
    self.add_nodes(num, dc_idx=dc_idx)
  File "/sct/sdcm/cluster_aws.py", line 434, in add_nodes
    enable_auto_bootstrap=enable_auto_bootstrap)
  File "/sct/sdcm/cluster_aws.py", line 220, in add_nodes
    instances = self._create_instances(count, ec2_user_data, dc_idx)
  File "/sct/sdcm/cluster_aws.py", line 192, in _create_instances
    instances = self._create_spot_instances(count, interfaces, ec2_user_data, dc_idx)
  File "/sct/sdcm/cluster_aws.py", line 134, in _create_spot_instances
    subnet_info = ec2.get_subnet_info(self._ec2_subnet_id[dc_idx])
  File "/sct/sdcm/ec2_client.py", line 313, in get_subnet_info
    resp = self._client.describe_subnets(SubnetIds=[subnet_id])
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 314, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 612, in _make_api_call
    raise error_class(parsed_response, operation_name)
ClientError: An error occurred (InvalidSubnetID.NotFound) when calling the DescribeSubnets operation: The subnet ID 'subnet-5207ee37' does not exist
Cleaning up resources used in the test
Exception in setUp. Will clean resources
Traceback (most recent call last):
  File "/sct/sdcm/tester.py", line 129, in wrapper
    return method(*args, **kwargs)
  File "/sct/sdcm/tester.py", line 177, in setUp
    self.init_resources()
  File "/sct/sdcm/tester.py", line 129, in wrapper
    return method(*args, **kwargs)
  File "/sct/sdcm/tester.py", line 636, in init_resources
    monitor_info=monitor_info)
  File "/sct/sdcm/tester.py", line 467, in get_cluster_aws
    self.db_cluster = create_cluster(db_type)
  File "/sct/sdcm/tester.py", line 458, in create_cluster
    **cl_params)
  File "/sct/sdcm/cluster_aws.py", line 405, in __init__
    params=params)
  File "/sct/sdcm/cluster.py", line 1451, in __init__
    super(BaseScyllaCluster, self).__init__(*args, **kwargs)
  File "/sct/sdcm/cluster_aws.py", line 102, in __init__
    region_names=self.region_names)
  File "/sct/sdcm/cluster.py", line 1270, in __init__
    self.add_nodes(num, dc_idx=dc_idx)
  File "/sct/sdcm/cluster_aws.py", line 434, in add_nodes
    enable_auto_bootstrap=enable_auto_bootstrap)
  File "/sct/sdcm/cluster_aws.py", line 220, in add_nodes
    instances = self._create_instances(count, ec2_user_data, dc_idx)
  File "/sct/sdcm/cluster_aws.py", line 192, in _create_instances
    instances = self._create_spot_instances(count, interfaces, ec2_user_data, dc_idx)
  File "/sct/sdcm/cluster_aws.py", line 134, in _create_spot_instances
    subnet_info = ec2.get_subnet_info(self._ec2_subnet_id[dc_idx])
  File "/sct/sdcm/ec2_client.py", line 313, in get_subnet_info
    resp = self._client.describe_subnets(SubnetIds=[subnet_id])
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 314, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 612, in _make_api_call
    raise error_class(parsed_response, operation_name)
ClientError: An error occurred (InvalidSubnetID.NotFound) when calling the DescribeSubnets operation: The subnet ID 'subnet-5207ee37' does not exist
Cleaning up resources used in the test

Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:436
Traceback (most recent call last):
  File "/sct/sdcm/tester.py", line 129, in wrapper
    return method(*args, **kwargs)
  File "/sct/sdcm/tester.py", line 177, in setUp
    self.init_resources()
  File "/sct/sdcm/tester.py", line 129, in wrapper
    return method(*args, **kwargs)
  File "/sct/sdcm/tester.py", line 636, in init_resources
    monitor_info=monitor_info)
  File "/sct/sdcm/tester.py", line 467, in get_cluster_aws
    self.db_cluster = create_cluster(db_type)
  File "/sct/sdcm/tester.py", line 458, in create_cluster
    **cl_params)
  File "/sct/sdcm/cluster_aws.py", line 405, in __init__
    params=params)
  File "/sct/sdcm/cluster.py", line 1451, in __init__
    super(BaseScyllaCluster, self).__init__(*args, **kwargs)
  File "/sct/sdcm/cluster_aws.py", line 102, in __init__
    region_names=self.region_names)
  File "/sct/sdcm/cluster.py", line 1270, in __init__
    self.add_nodes(num, dc_idx=dc_idx)
  File "/sct/sdcm/cluster_aws.py", line 434, in add_nodes
    enable_auto_bootstrap=enable_auto_bootstrap)
  File "/sct/sdcm/cluster_aws.py", line 220, in add_nodes
    instances = self._create_instances(count, ec2_user_data, dc_idx)
  File "/sct/sdcm/cluster_aws.py", line 192, in _create_instances
    instances = self._create_spot_instances(count, interfaces, ec2_user_data, dc_idx)
  File "/sct/sdcm/cluster_aws.py", line 134, in _create_spot_instances
    subnet_info = ec2.get_subnet_info(self._ec2_subnet_id[dc_idx])
  File "/sct/sdcm/ec2_client.py", line 313, in get_subnet_info
    resp = self._client.describe_subnets(SubnetIds=[subnet_id])
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 314, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 612, in _make_api_call
    raise error_class(parsed_response, operation_name)
ClientError: An error occurred (InvalidSubnetID.NotFound) when calling the DescribeSubnets operation: The subnet ID 'subnet-5207ee37' does not exist

ERROR 1-mgmt_cli_test.py:MgmtCliTest.test_mgmt_cluster_healthcheck -> TestSetupFail: An error occurred (InvalidSubnetID.NotFound) when calling the DescribeSubnets operation: The subnet ID 'subnet-5207ee37' does not exist

Error receiving message from test: <type 'exceptions.TypeError'> -> ('__init__() takes exactly 3 arguments (2 given)', <class 'botocore.exceptions.ClientError'>, (u"An error occurred (InvalidSubnetID.NotFound) when calling the DescribeSubnets operation: The subnet ID 'subnet-5207ee37' does not exist",))

Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/runner.py:75
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/queues.py", line 376, in get
    return recv()
TypeError: ('__init__() takes exactly 3 arguments (2 given)', <class 'botocore.exceptions.ClientError'>, (u"An error occurred (InvalidSubnetID.NotFound) when calling the DescribeSubnets operation: The subnet ID 'subnet-5207ee37' does not exist",))

ERROR 1-mgmt_cli_test.py:MgmtCliTest.test_mgmt_cluster_healthcheck -> TestAbortedError: Test aborted unexpectedly

the yaml has:


instance_provision: 'spot_low_price'
+
us_east_1_and_us_west_2:
            user_credentials_path: '~/.ssh/scylla-qa-ec2'
            region_name: 'us-east-1 us-west-2'
            security_group_ids: 'sg-c5e1f7a0 sg-81703ae4'
            subnet_id: 'subnet-ec4a72c4 subnet-5207ee37'
            ami_id_db_scylla: 'ami-0fc423ac17a75570d ami-0da599147b1d9e80d'
roydahan commented 5 years ago

@amoskong, can you please check Yaron's comment?

amoskong commented 5 years ago

Hi @yarongilor, Can you clone the latest master to a clean environment, and run again?

I just copied yaml from https://github.com/scylladb/scylla-cluster-tests/issues/671#issuecomment-433863706, and added store_results_in_elasticsearch: False

The setup worked well (I'm also using latest master). Yaml: y.yaml.txt , Job log: multi-dc-yaron-example.log.txt , Avocado cmdline: avocado run longevity_test.py:LongevityTest.test_custom_time --job-results-dir ./ --multiplex tests/y.yaml --filter-only /run/backends/aws /run/databases/scylla --show-job-log

image image

yarongilor commented 5 years ago

problem is still reproduced on master branch. failure details:

Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:436
Traceback (most recent call last):
  File "/sct/sdcm/tester.py", line 130, in wrapper
    return method(*args, **kwargs)
  File "/sct/sdcm/tester.py", line 178, in setUp
    self.init_resources()
  File "/sct/sdcm/tester.py", line 130, in wrapper
    return method(*args, **kwargs)
  File "/sct/sdcm/tester.py", line 641, in init_resources
    monitor_info=monitor_info)
  File "/sct/sdcm/tester.py", line 472, in get_cluster_aws
    self.db_cluster = create_cluster(db_type)
  File "/sct/sdcm/tester.py", line 463, in create_cluster
    **cl_params)
  File "/sct/sdcm/cluster_aws.py", line 408, in __init__
    params=params)
  File "/sct/sdcm/cluster.py", line 1524, in __init__
    super(BaseScyllaCluster, self).__init__(*args, **kwargs)
  File "/sct/sdcm/cluster_aws.py", line 98, in __init__
    region_names=self.region_names)
  File "/sct/sdcm/cluster.py", line 1343, in __init__
    self.add_nodes(num, dc_idx=dc_idx)
  File "/sct/sdcm/cluster_aws.py", line 437, in add_nodes
    enable_auto_bootstrap=enable_auto_bootstrap)
  File "/sct/sdcm/cluster_aws.py", line 219, in add_nodes
    instances = self._create_instances(count, ec2_user_data, dc_idx)
  File "/sct/sdcm/cluster_aws.py", line 191, in _create_instances
    instances = self._create_spot_instances(count, interfaces, ec2_user_data, dc_idx)
  File "/sct/sdcm/cluster_aws.py", line 133, in _create_spot_instances
    subnet_info = ec2.get_subnet_info(self._ec2_subnet_id[dc_idx])
  File "/sct/sdcm/ec2_client.py", line 314, in get_subnet_info
    resp = self._client.describe_subnets(SubnetIds=[subnet_id])
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 314, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/client.py", line 612, in _make_api_call
    raise error_class(parsed_response, operation_name)
ClientError: An error occurred (InvalidSubnetID.NotFound) when calling the DescribeSubnets operation: The subnet ID 'subnet-5207ee37' does not exist

ERROR 1-mgmt_cli_test.py:MgmtCliTest.test_mgmt_repair_nemesis -> TestSetupFail: An error occurred (InvalidSubnetID.NotFound) when calling the DescribeSubnets operation: The subnet ID 'subnet-5207ee37' does not exist

Error receiving message from test: <type 'exceptions.TypeError'> -> ('__init__() takes exactly 3 arguments (2 given)', <class 'botocore.exceptions.ClientError'>, (u"An error occurred (InvalidSubnetID.NotFound) when calling the DescribeSubnets operation: The subnet ID 'subnet-5207ee37' does not exist",))

Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/runner.py:75
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/queues.py", line 376, in get
    return recv()
TypeError: ('__init__() takes exactly 3 arguments (2 given)', <class 'botocore.exceptions.ClientError'>, (u"An error occurred (InvalidSubnetID.NotFound) when calling the DescribeSubnets operation: The subnet ID 'subnet-5207ee37' does not exist",))

ERROR 1-mgmt_cli_test.py:MgmtCliTest.test_mgmt_repair_nemesis -> TestAbortedError: Test aborted unexpectedly
yarongilor commented 5 years ago

cmd line used is:

avocado --show test run mgmt_cli_test.py:MgmtCliTest.test_mgmt_repair_nemesis --multiplex tests/yg_test.yaml --filter-out /run/backends/gce --filter-only /run/backends/aws /run/databases/scylla

tests/yg_test.yaml is:

test_duration: 500
stress_cmd: "cassandra-stress write cl=QUORUM duration=1m -schema 'replication(strategy=NetworkTopologyStrategy,us-eastscylla_node_east=1,us-west-2scylla_node_west=1)' -port jmx=6868 -mode cql3 native -rate threads=100 -pop seq=1..10000"
cassandra_stress_duration: 10080
cassandra_stress_threads: 100
cassandra_stress_population_size: 10000
n_db_nodes: '1 1' # '1 1'
n_loaders: 1 #1
n_monitor_nodes: 1
monitor_branch: 'master' # Testing with latest monitoring for newest manager Dashboards
nemesis_class_name: 'MgmtCli'
nemesis_interval: 5
user_prefix: 'yaron_manager_multidc'
failure_post_behavior: keep
space_node_threshold: 6442
ip_ssh_connections: 'public'
store_results_in_elasticsearch: False
ami_id_db_scylla_desc: '2-3-0'

use_mgmt: true
mgmt_port: 10090
scylla_repo_m: 'http://repositories.scylladb.com/scylla/repo/7b02fff5-e4d0-4e4d-ad12-e605ca4873c2/centos/scylladb-2018.1.repo'
#scylla_mgmt_repo: 'http://downloads.scylladb.com.s3.amazonaws.com/manager/rpm/unstable/centos/branch-1.2/44/scylla-manager.repo'
scylla_mgmt_repo: 'http://downloads.scylladb.com/manager/rpm/unstable/centos/branch-1.3/6/scylla-manager.repo'
scylla_mgmt_upgrade_to_repo: 'http://downloads.scylladb.com/manager/rpm/unstable/centos/branch-1.3/6/scylla-manager.repo'

# Centos Repos:
# scylla_repo_m: 'http://repositories.scylladb.com/scylla/repo/7b02fff5-e4d0-4e4d-ad12-e605ca4873c2/centos/scylladb-2018.1.repo'
# scylla_repo_m: 'http://repositories.scylladb.com/scylla/repo/f4a2920f80c4bf178217c2553ad65ad7/centos/scylladb-2018.1.repo'
# scylla_mgmt_repo: 'http://downloads.scylladb.com/manager/rpm/unstable/centos/master/218/scylla-manager.repo'
# scylla_mgmt_repo: 'http://downloads.scylladb.com.s3.amazonaws.com/manager/rpm/unstable/centos/branch-1.2/44/scylla-manager.repo'
#
# Debian Repos:
# scylla_repo_m: http://repositories.scylladb.com/scylla/repo/4bafa2b1-9a0c-4008-a8ad-7f6ef9279e58/debian/scylladb-2017.1-jessie.list
# scylla_mgmt_repo: http://downloads.scylladb.com.s3.amazonaws.com/manager/deb/unstable/jessie/branch-1.2/latest/scylla-manager-1.2/scylla-manager.list

# Ubuntu Repos:
#scylla_repo_m: http://repositories.scylladb.com/scylla/repo/4bafa2b1-9a0c-4008-a8ad-7f6ef9279e58/ubuntu/scylladb-2018.1-xenial.list
#scylla_mgmt_repo: http://downloads.scylladb.com.s3.amazonaws.com/manager/deb/unstable/xenial/branch-1.2/latest/scylla-manager-1.2/scylla-manager.list

#scylla_mgmt_repo: 'http://downloads.scylladb.com.s3.amazonaws.com/manager/rpm/unstable/centos/branch-1.3/1/scylla-manager.repo'
#scylla_mgmt_repo: 'http://downloads.scylladb.com.s3.amazonaws.com/manager/rpm/unstable/centos/branch-1.2/44/scylla-manager.repo'

#scylla_mgmt_repo: 'MANAGER_REPO_URL'

#es_url:
#es_user:
#es_password:
instance_provision: 'spot_low_price'

backends: !mux
    aws: !mux
        # What is the backend that the suite will use to get machines from.
        cluster_backend: 'aws'
        # From 0.19 on, iotune will require bigger disk, so let's use a big
        # loader instance by default.
        instance_type_loader: 'c4.large'
        # Size of AWS monitor instance
        instance_type_monitor: i3.large
        us_east_1_and_us_west_2:
            user_credentials_path: '~/.ssh/scylla-qa-ec2'
            region_name: 'us-east-1 us-west-2'
            security_group_ids: 'sg-c5e1f7a0 sg-81703ae4'
            subnet_id: 'subnet-ec4a72c4 subnet-5207ee37'
            ami_id_db_scylla: 'ami-0fc423ac17a75570d ami-0da599147b1d9e80d'
            ami_db_scylla_user: 'centos'
            ami_id_loader: 'ami-0fc423ac17a75570d'
            ami_loader_user: 'centos'
            # ami_id_monitor: 'ami-010f2b2749b78a6c5' # scylla-enterprise ami # 'ami-9887c6e7' # Clean CentOs 7 ami 'ami-1c5cc366' # Clean Ubuntu16.4
            ami_id_monitor: 'ami-9887c6e7'
            ami_monitor_user: 'centos' #'ubuntu' #'centos' #'admin' (for Debian)
    gce: !mux
        cluster_backend: 'gce'
        user_credentials_path: '~/.ssh/scylla-test'
        gce_user_credentials: '~/Scylla-c41b78923a54.json'
        gce_service_account_email: 'skilled-adapter-452@appspot.gserviceaccount.com'
        gce_project: 'skilled-adapter-452'
        gce_image: 'https://www.googleapis.com/compute/v1/projects/centos-cloud/global/images/family/centos-7'
        gce_image_username: 'scylla-test'
        gce_instance_type_db: 'n1-highmem-8'
        gce_root_disk_type_db: 'pd-ssd'
        gce_root_disk_size_db: 50
        gce_n_local_ssd_disk_db: 1
        gce_instance_type_loader: 'n1-highcpu-4'
        gce_root_disk_type_loader: 'pd-standard'
        gce_root_disk_size_loader: 50
        gce_n_local_ssd_disk_loader: 0
        gce_instance_type_monitor: 'n1-standard-2'
        gce_root_disk_type_monitor: 'pd-standard'
        gce_root_disk_size_monitor: 50
        gce_n_local_ssd_disk_monitor: 0
        scylla_repo: https://s3.amazonaws.com/downloads.scylladb.com/rpm/unstable/centos/branch-1.7/37/scylla.repo
        #us_east_1:
        #  gce_datacenter: 'us-east1-b'
        multi_dcs:
          gce_datacenter: 'us-east1-b us-west1-b us-east4-b'

databases: !mux
    scylla:
        db_type: scylla
        instance_type_db: 'i3.large'
amoskong commented 5 years ago

Yaron, can you provide the full job.log?

On Wed, Jan 16, 2019 at 9:52 PM yarongilor notifications@github.com wrote:

cmd line: avocado --show test run mgmt_cli_test.py:MgmtCliTest.test_mgmt_repair_nemesis --multiplex tests/yg_test.yaml --filter-out /run/backends/gce --filter-only /run/backends/aws /run/databases/scylla

test_duration: 500 stress_cmd: "cassandra-stress write cl=QUORUM duration=1m -schema 'replication(strategy=NetworkTopologyStrategy,us-eastscylla_node_east=1,us-west-2scylla_node_west=1)' -port jmx=6868 -mode cql3 native -rate threads=100 -pop seq=1..10000" cassandra_stress_duration: 10080 cassandra_stress_threads: 100 cassandra_stress_population_size: 10000 n_db_nodes: '1 1' # '1 1' n_loaders: 1 #1 n_monitor_nodes: 1 monitor_branch: 'master' # Testing with latest monitoring for newest manager Dashboards nemesis_class_name: 'MgmtCli' nemesis_interval: 5 user_prefix: 'yaron_manager_multidc' failure_post_behavior: keep space_node_threshold: 6442 ip_ssh_connections: 'public' store_results_in_elasticsearch: False ami_id_db_scylla_desc: '2-3-0'

use_mgmt: true mgmt_port: 10090 scylla_repo_m: 'http://repositories.scylladb.com/scylla/repo/7b02fff5-e4d0-4e4d-ad12-e605ca4873c2/centos/scylladb-2018.1.repo'

scylla_mgmt_repo: 'http://downloads.scylladb.com.s3.amazonaws.com/manager/rpm/unstable/centos/branch-1.2/44/scylla-manager.repo'

scylla_mgmt_repo: 'http://downloads.scylladb.com/manager/rpm/unstable/centos/branch-1.3/6/scylla-manager.repo' scylla_mgmt_upgrade_to_repo: 'http://downloads.scylladb.com/manager/rpm/unstable/centos/branch-1.3/6/scylla-manager.repo'

Centos Repos:

scylla_repo_m: 'http://repositories.scylladb.com/scylla/repo/7b02fff5-e4d0-4e4d-ad12-e605ca4873c2/centos/scylladb-2018.1.repo'

scylla_repo_m: 'http://repositories.scylladb.com/scylla/repo/f4a2920f80c4bf178217c2553ad65ad7/centos/scylladb-2018.1.repo'

scylla_mgmt_repo: 'http://downloads.scylladb.com/manager/rpm/unstable/centos/master/218/scylla-manager.repo'

scylla_mgmt_repo: 'http://downloads.scylladb.com.s3.amazonaws.com/manager/rpm/unstable/centos/branch-1.2/44/scylla-manager.repo'

#

Debian Repos:

scylla_repo_m: http://repositories.scylladb.com/scylla/repo/4bafa2b1-9a0c-4008-a8ad-7f6ef9279e58/debian/scylladb-2017.1-jessie.list

scylla_mgmt_repo: http://downloads.scylladb.com.s3.amazonaws.com/manager/deb/unstable/jessie/branch-1.2/latest/scylla-manager-1.2/scylla-manager.list

Ubuntu Repos:

scylla_repo_m: http://repositories.scylladb.com/scylla/repo/4bafa2b1-9a0c-4008-a8ad-7f6ef9279e58/ubuntu/scylladb-2018.1-xenial.list

scylla_mgmt_repo http://repositories.scylladb.com/scylla/repo/4bafa2b1-9a0c-4008-a8ad-7f6ef9279e58/ubuntu/scylladb-2018.1-xenial.list#scylla_mgmt_repo: http://downloads.scylladb.com.s3.amazonaws.com/manager/deb/unstable/xenial/branch-1.2/latest/scylla-manager-1.2/scylla-manager.list

scylla_mgmt_repo: 'http://downloads.scylladb.com.s3.amazonaws.com/manager/rpm/unstable/centos/branch-1.3/1/scylla-manager.repo'

scylla_mgmt_repo: 'http://downloads.scylladb.com.s3.amazonaws.com/manager/rpm/unstable/centos/branch-1.2/44/scylla-manager.repo'

scylla_mgmt_repo: 'MANAGER_REPO_URL'

es_url:

es_user:

es_password:

instance_provision: 'spot_low_price'

backends: !mux aws: !mux

What is the backend that the suite will use to get machines from.

    cluster_backend: 'aws'
    # From 0.19 on, iotune will require bigger disk, so let's use a big
    # loader instance by default.
    instance_type_loader: 'c4.large'
    # Size of AWS monitor instance
    instance_type_monitor: i3.large
    us_east_1_and_us_west_2:
        user_credentials_path: '~/.ssh/scylla-qa-ec2'
        region_name: 'us-east-1 us-west-2'
        security_group_ids: 'sg-c5e1f7a0 sg-81703ae4'
        subnet_id: 'subnet-ec4a72c4 subnet-5207ee37'
        ami_id_db_scylla: 'ami-0fc423ac17a75570d ami-0da599147b1d9e80d'
        ami_db_scylla_user: 'centos'
        ami_id_loader: 'ami-0fc423ac17a75570d'
        ami_loader_user: 'centos'
        # ami_id_monitor: 'ami-010f2b2749b78a6c5' # scylla-enterprise ami # 'ami-9887c6e7' # Clean CentOs 7 ami 'ami-1c5cc366' # Clean Ubuntu16.4
        ami_id_monitor: 'ami-9887c6e7'
        ami_monitor_user: 'centos' #'ubuntu' #'centos' #'admin' (for Debian)
gce: !mux
    cluster_backend: 'gce'
    user_credentials_path: '~/.ssh/scylla-test'
    gce_user_credentials: '~/Scylla-c41b78923a54.json'
    gce_service_account_email: 'skilled-adapter-452@appspot.gserviceaccount.com'
    gce_project: 'skilled-adapter-452'
    gce_image: 'https://www.googleapis.com/compute/v1/projects/centos-cloud/global/images/family/centos-7'
    gce_image_username: 'scylla-test'
    gce_instance_type_db: 'n1-highmem-8'
    gce_root_disk_type_db: 'pd-ssd'
    gce_root_disk_size_db: 50
    gce_n_local_ssd_disk_db: 1
    gce_instance_type_loader: 'n1-highcpu-4'
    gce_root_disk_type_loader: 'pd-standard'
    gce_root_disk_size_loader: 50
    gce_n_local_ssd_disk_loader: 0
    gce_instance_type_monitor: 'n1-standard-2'
    gce_root_disk_type_monitor: 'pd-standard'
    gce_root_disk_size_monitor: 50
    gce_n_local_ssd_disk_monitor: 0
    scylla_repo: https://s3.amazonaws.com/downloads.scylladb.com/rpm/unstable/centos/branch-1.7/37/scylla.repo
    #us_east_1:
    #  gce_datacenter: 'us-east1-b'
    multi_dcs:
      gce_datacenter: 'us-east1-b us-west1-b us-east4-b'

databases: !mux scylla: db_type: scylla instance_type_db: 'i3.large'

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scylladb/scylla-cluster-tests/issues/671#issuecomment-454785886, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS5zLWdCv-XX1wr1Jeq5H5d5TZYoIJ_ks5vDy6ugaJpZM4X3stX .

amoskong commented 5 years ago

@yarongilor what's the job name? can this issue be reproduce every time?

amoskong commented 5 years ago

I reproduced this problem locally. Instance of first region (us-east-1) can be created successfully, second region has problem.

After switched the parameter of two region, it firstly tried to created instance for (us-west-2), but failed. So it's problem of aws (related with subnet setup, or ec2 resource available for spot), not problem of sct.

I will try to create a new subnet in us-west-2.

amoskong commented 5 years ago

I recreated a new vpc/security group/subnet in us-west2-ip, the result is same as yaron's config.

on_demand : works well spot_fleet & spot_low_price: doesn't work

amoskong commented 5 years ago

Tested on us-west-1 & eu-west-1 & us-west-2 & us-east-2

on_demand : works well spot_fleet & spot_low_price: doesn't work

amoskong commented 5 years ago

I didn't found useful information from google by InvalidSubnetID.NotFound spot_low_price spot_fleet

And I failed to report case for aws in https://console.aws.amazon.com/support/cases#/create. any suggestion? @roydahan

yarongilor commented 5 years ago

job.log

bentsi commented 5 years ago

we need to add more debug, this issue is in our spot code @yarongilor Add debug output of the following in the begining of _create_spot_instances:

I suspect that we create an Ec2 client for the wrong region

slivne commented 5 years ago

subnetid are local to each region you should use different subnet-id for each region,

roydahan commented 5 years ago

Of course, but this is not related. It's configured as: ami_id_db_scylla: 'ami-0fc423ac17a75570d ami-0da599147b1d9e80d' and works for on_demand.

@yarongilor, please try to do what Bentsi suggested.

yarongilor commented 5 years ago

yaml with:

        us_east_1_and_us_west_2:
            user_credentials_path: '~/.ssh/scylla-qa-ec2'
            region_name: 'us-east-1 us-west-2'
            security_group_ids: 'sg-c5e1f7a0 sg-81703ae4'
            subnet_id: 'subnet-ec4a72c4 subnet-5207ee37'

has output of:

yarongilor@yaron-pc:~/avocado/job-results/latest$ grep '\[dc_idx\] is:' job.log 
2019-01-17 16:31:02,507 cluster_aws      L0134 DEBUG| self.region_names[dc_idx] is: us-east-1
2019-01-17 16:31:02,508 cluster_aws      L0135 DEBUG| self._ec2_subnet_id[dc_idx] is: subnet-ec4a72c4
2019-01-17 16:31:28,928 cluster_aws      L0134 DEBUG| self.region_names[dc_idx] is: us-west-2
2019-01-17 16:31:28,928 cluster_aws      L0135 DEBUG| self._ec2_subnet_id[dc_idx] is: subnet-5207ee37
amoskong commented 5 years ago

thanks

Bentsi notifications@github.com 于 2019年1月18日周五 上午1:36写道:

Closed #671 https://github.com/scylladb/scylla-cluster-tests/issues/671 via 0bfaa1b https://github.com/scylladb/scylla-cluster-tests/commit/0bfaa1bcc18b00cabfe6b7a3c847b261349f3a94 .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scylladb/scylla-cluster-tests/issues/671#event-2080914530, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS5zGRc6elMdGyqUKQL5MUbLNBn3b0uks5vELSbgaJpZM4X3stX .