scylladb / scylla-machine-image

Apache License 2.0
19 stars 26 forks source link

user-data with experimental flag is ignored. #539

Open anand-chandrashekar opened 1 week ago

anand-chandrashekar commented 1 week ago

When I create an Scylla node on AWS instance with the following user data , the yaml is ignored. The cluster comes up with a random name. seed_exp.json:

{
    "scylla_yaml": {
        "cluster_name": "FooBarExperimental",
        "experimental": true,
        "alternator_port" : 8000,
        "alternator_write_isolation" : "only_rmw_uses_lwt"
        },
    "start_scylla_on_first_boot": true
}

Once the experimental field is removed, the node creation goes through.

How the instance was created? I used the scylla machine image for us-east-1.

aws ec2 run-instances --image-id ami-0bf4f7af3188c4304 --instance-type i4i.xlarge --key-name xyz --count 1 --ebs-optimized --network-interfaces 'AssociatePublicIpAddress=true,DeviceIndex=0, Groups=[abc]' --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=test-1}]' --placement 'AvailabilityZone=us-east-1a'  --user-data file://seed_exp.json/ 

cc: @gcarmin

anand-chandrashekar commented 1 week ago

Scylla version:

 scylla --version
2024.1.5-0.20240606.842c1b55430b
fruch commented 6 days ago

See:

https://forum.scylladb.com/t/experimental-features/61

I don't know from which version exactly, but the blanket enabling all experimental features like that, isn't supported.

You need to specify exactly which one you are enabling.

That doesn't explain why the yaml configuration wasn't passed to Scylla correctly, but worth trying experimental-features for what you are doing.

anand-chandrashekar commented 6 days ago

Thanks @fruch Setting experimental features as below did not work.

{
    "scylla_yaml": {
        "cluster_name": "FooBar",
        "alternator_port" : 8000,
        "alternator_write_isolation" : "only_rmw_uses_lwt",
        "experimental_features" : ["alternator-streams"]
        },
    "start_scylla_on_first_boot": true
}
fruch commented 6 days ago

Thanks @fruch Setting experimental features as below did not work.

{
    "scylla_yaml": {
        "cluster_name": "FooBar",
        "alternator_port" : 8000,
        "alternator_write_isolation" : "only_rmw_uses_lwt",
        "experimental_features" : ["alternator-streams"]
        },
    "start_scylla_on_first_boot": true
}

Can you share the journalctl output for scylla-machine-image ? form one of the booted nodes ?

fruch commented 3 days ago

@anand-chandrashekar

I've tried the command you supplied, and it work as expected, i.e. all of the data from scylla_yaml was passed on to scylla yaml:

# See '/etc/scylla/scylla.yaml.example' with the full list of supported configuration
# options and their descriptions.
alternator_port: 8000
alternator_write_isolation: only_rmw_uses_lwt
api_address: 127.0.0.1
api_doc_dir: /opt/scylladb/api/api-doc/
api_port: 10000
api_ui_dir: /opt/scylladb/swagger-ui/dist/
auto_bootstrap: true
batch_size_fail_threshold_in_kb: 1024
batch_size_warn_threshold_in_kb: 128
broadcast_rpc_address: 10.12.1.184
cas_contention_timeout_in_ms: 1000
cluster_name: FooBarExperimental
commitlog_segment_size_in_mb: 32
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_total_space_in_mb: -1
consistent_cluster_management: true
endpoint_snitch: Ec2Snitch
experimental: true
force_schema_commit_log: true
listen_address: 10.12.1.184
murmur3_partitioner_ignore_msb_bits: 12
native_shard_aware_transport_port: 19042
native_transport_port: 9042
num_tokens: 256
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
read_request_timeout_in_ms: 5000
rpc_address: 0.0.0.0
rpc_port: 9160
schema_commitlog_segment_size_in_mb: 128
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
  parameters:
  - seeds: 10.12.1.184
strict_is_not_null_in_views: true
write_request_timeout_in_ms: 2000

but seems like it's happening, already after scylla is booted:

Sep 15 06:56:09 ip-10-12-1-184 scylla[3711]:  [shard 0:stre] schema_tables - Schema version changed to c0ffa0b3-68c5-378d-bc5a-07c9237d9d25
Sep 15 06:56:09 ip-10-12-1-184 scylla[3711]:  [shard 0:main] init - starting native transport
Sep 15 06:56:09 ip-10-12-1-184 scylla[3711]:  [shard 0:sl:d] cql_server_controller - Starting listening for CQL clients on 0.0.0.0:9042 (unencrypted, non-shard-aware)
Sep 15 06:56:09 ip-10-12-1-184 scylla[3711]:  [shard 0:sl:d] cql_server_controller - Starting listening for CQL clients on 0.0.0.0:19042 (unencrypted, shard-aware)
Sep 15 06:56:09 ip-10-12-1-184 scylla[3711]:  [shard 0:sl:d] alternator_controller - Alternator server listening on 0.0.0.0, HTTP port 8000, HTTPS port OFF
Sep 15 06:56:09 ip-10-12-1-184 scylla[3711]:  [shard 0:main] init - starting the expiration service
Sep 15 06:56:09 ip-10-12-1-184 scylla[3711]:  [shard 0:stre] alternator_ttl - sleeping 86400 seconds until next period
Sep 15 06:56:09 ip-10-12-1-184 scylla[3711]:  [shard 1:stre] alternator_ttl - sleeping 86400 seconds until next period
Sep 15 06:56:09 ip-10-12-1-184 scylla[3711]:  [shard 2:stre] alternator_ttl - sleeping 86400 seconds until next period
Sep 15 06:56:09 ip-10-12-1-184 scylla[3711]:  [shard 3:stre] alternator_ttl - sleeping 86400 seconds until next period
Sep 15 06:56:09 ip-10-12-1-184 scylla[3711]:  [shard 0:main] init - serving
Sep 15 06:56:09 ip-10-12-1-184 scylla[3711]:  [shard 0:main] init - Scylla version 2024.1.5-0.20240606.842c1b55430b initialization completed.
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]: 2024-09-15 06:56:15,565 - [user_data] - INFO - Got user-data: {
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]:     "scylla_yaml": {
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]:         "cluster_name": "FooBarExperimental",
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]:         "experimental": true,
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]:         "alternator_port" : 8000,
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]:         "alternator_write_isolation" : "only_rmw_uses_lwt"
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]:         },
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]:     "start_scylla_on_first_boot": true
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]: }
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]: Got user-data: {
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]:     "scylla_yaml": {
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]:         "cluster_name": "FooBarExperimental",
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]:         "experimental": true,
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]:         "alternator_port" : 8000,
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]:         "alternator_write_isolation" : "only_rmw_uses_lwt"
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]:         },
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]:     "start_scylla_on_first_boot": true
Sep 15 06:56:15 ip-10-12-1-184 scylla_post_start.py[3810]: }

so seem like "start_scylla_on_first_boot": true isn't doing exactly what advertised, i.e. it conflicting with passing on the configuration values.

now I'm just wondering if it was broken like that from day one, it go broken at some point. in most automation code (and also I think in scylla-cloud), we don't count on this configuration option for scylla.yaml

@yaronkaikov