Closed bentcoder closed 2 years ago
We recently saw this running Scylla as a statefulset in Kubernetes using Synology CSI to provide dynamically allocated volumes. We thought it was a problem with the latter somehow, but we just deleted the pod and it came back up just fine. 🤷♂️
Ran into this while running inside a k3d cluster that uses local-path-provisioner as CSI. I wish there was a more useful error to help figure out what the root cause actually is.
afaik 4.4.4 doesn't have support for ARM it was added in 4.6
my_scylla | 2022-02-24 17:03:11,915 INFO exited: scylla (terminated by SIGKILL; not expected)
(scylla simply dumps core and won't start)I am getting same error with version 4.6. on my M1 Mac when I run it inside Minikube. Works fine with docker-compose.
I remember solving this issue using one of 4.6.0.rc* images but cannot remember which one it was. Given 4.6.0 is now released, it should help you.
@bentcoder 4.6.0 does not solve the issue. Can we reopen?
If anyone else is stuck on this, I'm still using 4.6.dev-0.20210801.3089558f8 where I don't get this problem.
It seems to be fixed with scylla 4.6.1
Not here I'm afraid :(
I was having the same issue with 4.6, but removing platform: linux/amd64
fixes it for me.
I have had the same issue with 4.6 using an M1 Mac with Docker Desktop. Not had same issue with latest images. 4.6.6 has arm64/v8 build in it's manifest for M1 users.
5.0.1 works. Comparing the output of skopeo inspect --raw for the bad and good images:
$ diff -u /tmp/4.6.6 /tmp/5.0.1
--- /tmp/4.6.6 2022-08-16 20:17:02.083843551 +0300
+++ /tmp/5.0.1 2022-08-16 20:16:48.799692109 +0300
@@ -1,20 +1,19 @@
{
"schemaVersion": 2,
- "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
"manifests": [
{
- "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
- "size": 594,
- "digest": "sha256:f25c24c291f6dec5c96a1a02676e848571c7e63b74aac8fb64501790b3dc9d3b",
+ "mediaType": "application/vnd.oci.image.manifest.v1+json",
+ "digest": "sha256:962a06451a6a6141cf8e3b5b3cc06975b4d5542011c9cc5863117aa0b1e2f7b7",
+ "size": 509,
"platform": {
"architecture": "arm64",
"os": "linux"
}
},
{
- "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
- "size": 594,
- "digest": "sha256:1ae2ec88dcf03e4ad69105f1ef63b4f55cdcf466c8c5715632260012808ba432",
+ "mediaType": "application/vnd.oci.image.manifest.v1+json",
+ "digest": "sha256:81b9320e98f324c7cb48b496bca4ff553e0e22978e6f6ea78f5c143b104bd1b2",
+ "size": 708,
"platform": {
"architecture": "amd64",
"os": "linux"
Maybe the mediaType change confuses some docker implementations.
Closing as fixed in 5.0.1. If anyone still sees a problem, please report it here.
@avikivity I got the same problem on 5.0.1, with and without platform: linux/amd64
Edit: it worked when I added platform: linux/arm64
@avikivity
running: (['/opt/scylladb/scripts/scylla_dev_mode_setup', '--developer-mode', '1'],)
running: (['/opt/scylladb/scripts/scylla_io_setup'],)
2023-02-16 06:00:55,956 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message.
2023-02-16 06:00:55,956 INFO Included extra file "/etc/supervisord.conf.d/rsyslog.conf" during parsing
2023-02-16 06:00:55,956 INFO Included extra file "/etc/supervisord.conf.d/scylla-housekeeping.conf" during parsing
2023-02-16 06:00:55,956 INFO Included extra file "/etc/supervisord.conf.d/scylla-jmx.conf" during parsing
2023-02-16 06:00:55,956 INFO Included extra file "/etc/supervisord.conf.d/scylla-node-exporter.conf" during parsing
2023-02-16 06:00:55,956 INFO Included extra file "/etc/supervisord.conf.d/scylla-server.conf" during parsing
2023-02-16 06:00:55,956 INFO Included extra file "/etc/supervisord.conf.d/sshd-server.conf" during parsing
2023-02-16 06:00:55,965 INFO RPC interface 'supervisor' initialized
2023-02-16 06:00:55,966 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2023-02-16 06:00:55,966 INFO supervisord started with pid 26
2023-02-16 06:00:56,969 INFO spawned: 'rsyslog' with pid 28
2023-02-16 06:00:56,973 INFO spawned: 'scylla' with pid 29
2023-02-16 06:00:56,975 INFO spawned: 'scylla-housekeeping' with pid 30
2023-02-16 06:00:56,978 INFO spawned: 'scylla-jmx' with pid 31
2023-02-16 06:00:56,981 INFO spawned: 'scylla-node-exporter' with pid 32
2023-02-16 06:00:56,983 INFO spawned: 'sshd' with pid 34
rsyslogd: imklog: cannot open kernel log (/proc/kmsg): Operation not permitted.
rsyslogd: activation of module imklog failed [v8.2001.0 try https://www.rsyslog.com/e/2145 ]
ts=2023-02-16T06:00:57.396Z caller=node_exporter.go:182 level=info msg="Starting node_exporter" version="(version=1.3.1, branch=HEAD, revision=a2321e7b940ddcff26873612bccdf7cd4c42b6b6)"
ts=2023-02-16T06:00:57.406Z caller=node_exporter.go:183 level=info msg="Build context" build_context="(go=go1.17.3, user=root@243aafa5525c, date=20211205-11:10:22)"
ts=2023-02-16T06:00:57.406Z caller=node_exporter.go:185 level=warn msg="Node Exporter is running as root user. This exporter is designed to run as unpriviledged user, root is not required."
ts=2023-02-16T06:00:57.406Z caller=filesystem_common.go:111 level=info collector=filesystem msg="Parsed flag --collector.filesystem.mount-points-exclude" flag=^/(dev|proc|run/credentials/.+|sys|var/lib/docker/.+)($|/)
ts=2023-02-16T06:00:57.406Z caller=filesystem_common.go:113 level=info collector=filesystem msg="Parsed flag --collector.filesystem.fs-types-exclude" flag=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:108 level=info msg="Enabled collectors"
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=arp
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=bcache
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=bonding
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=btrfs
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=conntrack
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=cpu
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=cpufreq
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=diskstats
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=dmi
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=edac
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=entropy
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=fibrechannel
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=filefd
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=filesystem
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=hwmon
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=infiniband
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=interrupts
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=ipvs
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=loadavg
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=mdadm
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=meminfo
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=netclass
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=netdev
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=netstat
ts=2023-02-16T06:00:57.407Z caller=node_exporter.go:115 level=info collector=nfs
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=nfsd
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=nvme
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=os
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=powersupplyclass
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=pressure
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=rapl
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=schedstat
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=sockstat
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=softnet
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=stat
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=tapestats
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=textfile
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=thermal_zone
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=time
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=timex
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=udp_queues
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=uname
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=vmstat
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=xfs
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:115 level=info collector=zfs
ts=2023-02-16T06:00:57.408Z caller=node_exporter.go:199 level=info msg="Listening on" address=:9100
ts=2023-02-16T06:00:57.408Z caller=tls_config.go:195 level=info msg="TLS is disabled." http2=false
WARN 2023-02-16 06:00:58,322 [shard 1] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-02-16 06:00:58,326 [shard 0] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-02-16 06:00:58,322 [shard 7] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-02-16 06:00:58,334 [shard 2] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-02-16 06:00:58,345 [shard 3] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
INFO 2023-02-16 06:00:58,356 [shard 1] seastar - Created fair group io-queue-0, capacity rate 2147483:2147483, limit 12582912, rate 16777216 (factor 1), threshold 2000
INFO 2023-02-16 06:00:58,362 [shard 1] seastar - IO queue uses 0.75ms latency goal for device 0
INFO 2023-02-16 06:00:58,362 [shard 1] seastar - Created io group dev(0), length limit 4194304:4194304, rate 2147483647:2147483647
INFO 2023-02-16 06:00:58,362 [shard 0] seastar - Created io queue dev(0) capacities: 512:2000:2000 1024:3000:3000 2048:5000:5000 4096:9000:9000 8192:17000:17000 16384:33000:33000 32768:65000:65000 65536:129000:129000 131072:257000:257000
WARN 2023-02-16 06:00:58,381 [shard 5] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-02-16 06:00:58,439 [shard 4] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
WARN 2023-02-16 06:00:58,486 [shard 6] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:1, perf_event_open() failed: Operation not permitted)
INFO 2023-02-16 06:00:58,535 [shard 0] seastar - updated: blocked-reactor-notify-ms=1000000
INFO 2023-02-16 06:00:58,535 [shard 3] seastar - updated: blocked-reactor-notify-ms=1000000
INFO 2023-02-16 06:00:58,536 [shard 4] seastar - updated: blocked-reactor-notify-ms=1000000
INFO 2023-02-16 06:00:58,539 [shard 5] seastar - updated: blocked-reactor-notify-ms=1000000
Scylla version 5.1.5-0.20230207.5c9ecd560440 with build-id f1b2f23996e7951ec05e77b47695813bb8e8bb00 starting ...
command used: "/usr/bin/scylla --log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix --developer-mode=1 --overprovisioned --listen-address 172.18.0.3 --rpc-address 172.18.0.3 --seed-provider-parameters seeds=172.18.0.3 --alternator-address 172.18.0.3 --blocked-reactor-notify-ms 999999999"
parsed command line options: [log-to-syslog, (positional) 0, log-to-stdout, (positional) 1, default-log-level, (positional) info, network-stack, (positional) posix, developer-mode: 1, overprovisioned, listen-address: 172.18.0.3, rpc-address: 172.18.0.3, seed-provider-parameters: seeds=172.18.0.3, alternator-address: 172.18.0.3, blocked-reactor-notify-ms, (positional) 999999999]
2023-02-16 06:00:58,162 INFO success: rsyslog entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-02-16 06:00:58,162 INFO success: scylla entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-02-16 06:00:58,162 INFO success: scylla-housekeeping entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-02-16 06:00:58,162 INFO success: scylla-jmx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-02-16 06:00:58,162 INFO success: scylla-node-exporter entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-02-16 06:00:58,162 INFO success: sshd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Connecting to http://localhost:10000
Starting the JMX server
JMX is enabled to receive remote connections on port: 7199
INFO 2023-02-16 06:00:58,539 [shard 7] seastar - updated: blocked-reactor-notify-ms=1000000
INFO 2023-02-16 06:00:58,539 [shard 2] seastar - updated: blocked-reactor-notify-ms=1000000
INFO 2023-02-16 06:00:58,539 [shard 1] seastar - updated: blocked-reactor-notify-ms=1000000
INFO 2023-02-16 06:00:58,539 [shard 6] seastar - updated: blocked-reactor-notify-ms=1000000
INFO 2023-02-16 06:00:58,562 [shard 0] init - installing SIGHUP handler
INFO 2023-02-16 06:00:58,624 [shard 0] init - Scylla version 5.1.5-0.20230207.5c9ecd560440 with build-id f1b2f23996e7951ec05e77b47695813bb8e8bb00 starting ...
WARN 2023-02-16 06:00:58,625 [shard 0] init - I/O Scheduler is not properly configured! This is a non-supported setup, and performance is expected to be unpredictably bad.
Reason found: none of --max-io-requests, --io-properties and --io-properties-file are set.
To properly configure the I/O Scheduler, run the scylla_io_setup utility shipped with Scylla.
INFO 2023-02-16 06:00:58,637 [shard 0] init - starting prometheus API server
INFO 2023-02-16 06:00:58,644 [shard 0] init - starting tokens manager
INFO 2023-02-16 06:00:58,652 [shard 0] init - starting effective_replication_map factory
INFO 2023-02-16 06:00:58,652 [shard 0] init - starting migration manager notifier
INFO 2023-02-16 06:00:58,653 [shard 0] init - starting lifecycle notifier
INFO 2023-02-16 06:00:58,653 [shard 0] init - creating tracing
INFO 2023-02-16 06:00:58,655 [shard 0] init - starting API server
INFO 2023-02-16 06:00:58,658 [shard 0] init - Scylla API server listening on 127.0.0.1:10000 ...
INFO 2023-02-16 06:00:58,688 [shard 0] service_level_controller - update_from_distributed_data: starting configuration polling loop
INFO 2023-02-16 06:00:58,699 [shard 0] init - starting system keyspace
INFO 2023-02-16 06:00:58,709 [shard 0] init - starting gossiper
INFO 2023-02-16 06:00:58,709 [shard 0] init - seeds={172.18.0.3}, listen_address=172.18.0.3, broadcast_address=172.18.0.3
WARN 2023-02-16 06:00:58,714 [shard 0] init - Using default cluster name is not recommended. Using a unique cluster name will reduce the chance of adding nodes to the wrong cluster by mistake
INFO 2023-02-16 06:00:58,723 [shard 0] init - creating snitch
INFO 2023-02-16 06:00:58,724 [shard 0] init - starting direct failure detector service
INFO 2023-02-16 06:00:58,724 [shard 0] init - initializing storage service
INFO 2023-02-16 06:00:58,724 [shard 0] storage_service - Started node_ops_abort_thread
INFO 2023-02-16 06:00:58,734 [shard 4] storage_service - Started node_ops_abort_thread
INFO 2023-02-16 06:00:58,734 [shard 7] storage_service - Started node_ops_abort_thread
INFO 2023-02-16 06:00:58,734 [shard 5] storage_service - Started node_ops_abort_thread
INFO 2023-02-16 06:00:58,734 [shard 6] storage_service - Started node_ops_abort_thread
INFO 2023-02-16 06:00:58,734 [shard 3] storage_service - Started node_ops_abort_thread
INFO 2023-02-16 06:00:58,734 [shard 1] storage_service - Started node_ops_abort_thread
INFO 2023-02-16 06:00:58,734 [shard 2] storage_service - Started node_ops_abort_thread
INFO 2023-02-16 06:00:58,735 [shard 0] init - starting per-shard database core
INFO 2023-02-16 06:00:58,738 [shard 0] init - creating and verifying directories
Traceback (most recent call last):
File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 196, in <module>
args.func(args)
File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 122, in check_version
current_version = sanitize_version(get_api('/storage_service/scylla_release_version'))
File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 80, in get_api
return get_json_from_url("http://" + api_address + path)
File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 75, in get_json_from_url
raise RuntimeError(f'Failed to get "{path}" due to the following error: {retval}')
RuntimeError: Failed to get "http://localhost:10000/storage_service/scylla_release_version" due to the following error: HTTP Error 404: Not Found
And then when I try to stop the container:
WARN received SIGTERM indicating exit request
INFO waiting for rsyslog, scylla, scylla-housekeeping, scylla-jmx, scylla-node-exporter, sshd to die
INFO stopped: sshd (terminated by SIGTERM)
INFO stopped: scylla-node-exporter (terminated by SIGTERM)
INFO stopped: scylla-jmx (exit status 143)
INFO stopped: scylla-housekeeping (terminated by SIGTERM)
INFO waiting for rsyslog, scylla to die
INFO waiting for rsyslog, scylla to die
File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 122, in check_version
current_version = sanitize_version(get_api('/storage_service/scylla_release_version'))
File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 80, in get_api
return get_json_from_url("http://" + api_address + path)
File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 75, in get_json_from_url
raise RuntimeError(f'Failed to get "{path}" due to the following error: {retval}')
RuntimeError: Failed to get "http://localhost:10000/storage_service/scylla_release_version" due to the following error: HTTP Error 404: Not Found
And the container doesn't stop.
Mac M1.
Hi,
The container is running but I see error below. Is there a solution to it?
Thanks