scylladb / scylla-operator

The Kubernetes Operator for ScyllaDB
https://operator.docs.scylladb.com/
Apache License 2.0
330 stars 163 forks source link

Liveness probe constantly fails during resharding #894

Closed zimnx closed 2 years ago

zimnx commented 2 years ago

Describe the bug

To Reproduce Steps to reproduce the behavior:

  1. Load the cluster with big enough data
  2. nodetool flush
  3. increase CPU resources in CRD

Expected behavior Node is not-ready during resharding process, but stay alive.

Logs

I1215 12:22:56.761756       1 operator/sidecar.go:149] sidecar version v1.5.0-beta.0-0-gcbb0f53
I1215 12:22:56.761974       1 operator/sidecar.go:150] loglevel is set to "2"
I1215 12:22:56.776150       1 config/config.go:64] Setting up scylla.yaml
I1215 12:22:56.776311       1 config/config.go:96] "no scylla.yaml config map available"
I1215 12:22:56.777973       1 config/config.go:68] Setting up cassandra-rackdc.properties
I1215 12:22:56.778006       1 config/config.go:157] "unable to read properties" file="/mnt/scylla-config/cassandra-rackdc.properties"
I1215 12:22:56.778115       1 config/config.go:72] Setting up entrypoint script
I1215 12:22:56.798163       1 config/config.go:253] "Scylla version detected" version={version:{Major:4 Minor:4 Patch:0 Pre:[] Build:[]} unknown:false}
I1215 12:22:56.798261       1 config/config.go:281] "Scylla entrypoint" Command="/docker-entrypoint.py --seeds=10.79.14.175 --developer-mode=1 --overprovisioned=1 --smp=7 --prometheus-address=0.0.0.0 --listen-address=0.0.0.0 --broadcast-address=10.79.5.38 --broadcast-rpc-address=10.79.5.38"
I1215 12:22:56.799267       1 sidecar/controller.go:171] "Starting controller" Controller="SidecarController"
I1215 12:22:56.799505       1 cache/shared_informer.go:240] Waiting for caches to sync for SidecarController
I1215 12:22:56.799369       1 cache/reflector.go:219] Starting reflector *v1.Service (12h0m0s) from k8s.io/client-go@v0.21.1/tools/cache/reflector.go:167
I1215 12:22:56.799404       1 cache/reflector.go:219] Starting reflector *v1.Secret (12h0m0s) from k8s.io/client-go@v0.21.1/tools/cache/reflector.go:167
I1215 12:22:56.799488       1 cache/shared_informer.go:240] Waiting for caches to sync for Prober
running: (['/opt/scylladb/scripts/scylla_dev_mode_setup', '--developer-mode', '1'],)
I1215 12:22:57.400331       1 cache/shared_informer.go:247] Caches are synced for SidecarController 
I1215 12:22:57.400379       1 cache/shared_informer.go:247] Caches are synced for Prober 
I1215 12:22:57.400690       1 operator/sidecar.go:270] "Starting Prober server"
running: (['/opt/scylladb/scripts/scylla_io_setup'],)
2021-12-15 12:22:57,717 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
2021-12-15 12:22:57,718 INFO Included extra file "/etc/supervisord.conf.d/node-exporter.conf" during parsing
2021-12-15 12:22:57,718 INFO Included extra file "/etc/supervisord.conf.d/rsyslog.conf" during parsing
2021-12-15 12:22:57,718 INFO Included extra file "/etc/supervisord.conf.d/scylla-housekeeping.conf" during parsing
2021-12-15 12:22:57,718 INFO Included extra file "/etc/supervisord.conf.d/scylla-jmx.conf" during parsing
2021-12-15 12:22:57,718 INFO Included extra file "/etc/supervisord.conf.d/scylla-server.conf" during parsing
2021-12-15 12:22:57,718 INFO Included extra file "/etc/supervisord.conf.d/sshd-server.conf" during parsing
2021-12-15 12:22:57,742 INFO RPC interface 'supervisor' initialized
2021-12-15 12:22:57,742 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2021-12-15 12:22:57,742 INFO supervisord started with pid 38
2021-12-15 12:22:58,745 INFO spawned: 'scylla' with pid 41
2021-12-15 12:22:58,748 INFO spawned: 'scylla-housekeeping' with pid 42
2021-12-15 12:22:58,750 INFO spawned: 'sshd' with pid 43
2021-12-15 12:22:58,753 INFO spawned: 'scylla-jmx' with pid 45
2021-12-15 12:22:58,756 INFO spawned: 'rsyslog' with pid 47
2021-12-15 12:22:58,758 INFO spawned: 'node-exporter' with pid 51
Generating public/private ed25519 key pair.
Your identification has been saved in /etc/ssh/ssh_host_ed25519_key.
Your public key has been saved in /etc/ssh/ssh_host_ed25519_key.pub.
The key fingerprint is:
SHA256:4DTPl0zplLnotvMFFfJCX4PqWaPRw+hmOSRPHnWRb7M root@scylla-mumbai-asia-south1-asia-south1-a-0
The key's randomart image is:
+--[ED25519 256]--+
|          o ..+o |
|         . Booo. |
|      +   B*+. . |
|     o =.*B=*  .o|
|      . SOBB o .o|
|       . .%.   E |
|        oo ..    |
|       ... .     |
|        .o.      |
+----[SHA256]-----+
time="2021-12-15T12:22:58Z" level=info msg="Starting node_exporter (version=0.17.0, branch=HEAD, revision=f6f6194a436b9a63d0439abc585c76b19a206b21)" source="node_exporter.go:82"
time="2021-12-15T12:22:58Z" level=info msg="Build context (go=go1.11.2, user=root@322511e06ced, date=20181130-15:51:33)" source="node_exporter.go:83"
time="2021-12-15T12:22:58Z" level=info msg="Enabled collectors:" source="node_exporter.go:90"
time="2021-12-15T12:22:58Z" level=info msg=" - arp" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - bcache" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - bonding" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - conntrack" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - cpu" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - diskstats" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - edac" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - entropy" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - filefd" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - filesystem" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - hwmon" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - infiniband" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - interrupts" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - ipvs" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - loadavg" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - mdadm" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - meminfo" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - netclass" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - netdev" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - netstat" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - nfs" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - nfsd" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - sockstat" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - stat" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - textfile" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - time" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - timex" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - uname" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - vmstat" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - xfs" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg=" - zfs" source="node_exporter.go:97"
time="2021-12-15T12:22:58Z" level=info msg="Listening on :9100" source="node_exporter.go:111"
Scylla version 4.4.0-0.20210322.dffbcabbb with build-id 71a06d8290443f3abdd4f67911ab332db44d9a51 starting ...
command used: "/usr/bin/scylla --log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix --developer-mode=1 --smp 7 --overprovisioned --listen-address 0.0.0.0 --rpc-address 0.0.0.0 --seed-provider-parameters seeds=10.79.14.175 --broadcast-address 10.79.5.38 --broadcast-rpc-address 10.79.5.38 --blocked-reactor-notify-ms 999999999 --prometheus-address=0.0.0.0"
parsed command line options: [log-to-syslog: 0, log-to-stdout: 1, default-log-level: info, network-stack: posix, developer-mode: 1, smp: 7, overprovisioned, listen-address: 0.0.0.0, rpc-address: 0.0.0.0, seed-provider-parameters: seeds=10.79.14.175, broadcast-address: 10.79.5.38, broadcast-rpc-address: 10.79.5.38, blocked-reactor-notify-ms: 999999999, prometheus-address: 0.0.0.0]
INFO  2021-12-15 12:22:59,088 [shard 0] init - installing SIGHUP handler
INFO  2021-12-15 12:22:59,096 [shard 0] init - Scylla version 4.4.0-0.20210322.dffbcabbb with build-id 71a06d8290443f3abdd4f67911ab332db44d9a51 starting ...

INFO  2021-12-15 12:22:59,098 [shard 0] init - starting prometheus API server
INFO  2021-12-15 12:22:59,099 [shard 0] init - starting tokens manager
INFO  2021-12-15 12:22:59,099 [shard 0] init - starting migration manager notifier
INFO  2021-12-15 12:22:59,100 [shard 0] init - creating tracing
INFO  2021-12-15 12:22:59,100 [shard 0] init - creating snitch
INFO  2021-12-15 12:22:59,102 [shard 0] init - determining DNS name
INFO  2021-12-15 12:22:59,102 [shard 0] init - starting API server
INFO  2021-12-15 12:22:59,103 [shard 0] init - Scylla API server listening on 127.0.0.1:10000 ...
INFO  2021-12-15 12:22:59,105 [shard 0] init - initializing storage service
INFO  2021-12-15 12:22:59,106 [shard 0] init - starting per-shard database core
WARN  2021-12-15 12:22:59,111 [shard 0] init - I/O Scheduler is not properly configured! This is a non-supported setup, and performance is expected to be unpredictably bad.
 Reason found: none of --max-io-requests, --io-properties and --io-properties-file are set.
To properly configure the I/O Scheduler, run the scylla_io_setup utility shipped with Scylla.

INFO  2021-12-15 12:22:59,111 [shard 0] init - creating and verifying directories
Generating public/private rsa key pair.
Your identification has been saved in /etc/ssh/ssh_host_rsa_key.
Your public key has been saved in /etc/ssh/ssh_host_rsa_key.pub.
The key fingerprint is:
SHA256:PO4xjCH9N/No/KFnzni7ghLGyS5hxP6OZ+jCaEJ7rU4 root@scylla-mumbai-asia-south1-asia-south1-a-0
The key's randomart image is:
+---[RSA 4096]----+
|                 |
|                 |
|   .             |
|    o. .         |
|   o.oo.S        |
| .  +.** .       |
|.o.E.*..*o+ .    |
|oo+.oo*..++O+.   |
|o o=+=....+BBo   |
+----[SHA256]-----+
Connecting to http://localhost:10000
Could not load host key: /etc/ssh/ssh_host_ecdsa_key
Starting the JMX server
JMX is enabled to receive remote connections on port: 7199
INFO  2021-12-15 12:22:59,790 [shard 0] database - Populating Keyspace system_schema
2021-12-15 12:22:59,791 INFO success: scylla entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-12-15 12:22:59,791 INFO success: scylla-housekeeping entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-12-15 12:22:59,791 INFO success: sshd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-12-15 12:22:59,791 INFO success: scylla-jmx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-12-15 12:22:59,791 INFO success: rsyslog entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-12-15 12:22:59,791 INFO success: node-exporter entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
INFO  2021-12-15 12:22:59,791 [shard 0] database - Keyspace system_schema: Reading CF view_virtual_columns id=08843b63-45dc-3be2-9798-a0418295cfaa version=c777531c-15f7-326f-8ebe-39fd0265c8c9
INFO  2021-12-15 12:22:59,791 [shard 0] database - Keyspace system_schema: Reading CF computed_columns id=cc7c7069-3740-33c1-92a4-c3de78dbd2c4 version=2b8c4439-de76-31e0-807f-3b7290a975d7
INFO  2021-12-15 12:22:59,791 [shard 0] database - Keyspace system_schema: Reading CF aggregates id=924c5587-2e3a-345b-b10c-12f37c1ba895 version=4b53e92c-0368-3d5c-b959-2ec1bfd1a59f
INFO  2021-12-15 12:22:59,791 [shard 0] database - Keyspace system_schema: Reading CF functions id=96489b79-80be-3e14-a701-66a0b9159450 version=329ed804-55b3-3eee-ad61-d85317b96097
INFO  2021-12-15 12:22:59,791 [shard 0] database - Keyspace system_schema: Reading CF views id=9786ac1c-dd58-3201-a7cd-ad556410c985 version=5b58bb47-96e7-3f57-accf-0bfca4dbbc6e
INFO  2021-12-15 12:22:59,791 [shard 0] database - Keyspace system_schema: Reading CF indexes id=0feb57ac-311f-382f-ba6d-9024d305702f version=99c40462-8687-304e-abe3-2bdbef1f25aa
INFO  2021-12-15 12:22:59,791 [shard 0] database - Keyspace system_schema: Reading CF types id=5a8b1ca8-6602-3f77-a045-9273d308917a version=de51b2ce-5e4d-3b7d-a75f-2204332ce8d1
INFO  2021-12-15 12:22:59,791 [shard 0] database - Keyspace system_schema: Reading CF triggers id=4df70b66-6b05-3251-95a1-32b54005fd48 version=582d7071-1ef0-37c8-adc6-471a13636139
INFO  2021-12-15 12:22:59,791 [shard 0] database - Keyspace system_schema: Reading CF dropped_columns id=5e7583b5-f3f4-3af1-9a39-b7e1d6f5f11f version=7426bc6c-4c2f-3200-8ad8-4329610ed59a
INFO  2021-12-15 12:22:59,791 [shard 0] database - Keyspace system_schema: Reading CF columns id=24101c25-a2ae-3af7-87c1-b40ee1aca33f version=d33236d4-9bdd-3c09-abf0-a0bc5edc2526
INFO  2021-12-15 12:22:59,791 [shard 0] database - Keyspace system_schema: Reading CF scylla_tables id=5d912ff1-f759-3665-b2c8-8042ab5103dd version=38e5e56b-155d-39b5-a0d2-8e5dcad42a26
INFO  2021-12-15 12:22:59,791 [shard 0] database - Keyspace system_schema: Reading CF tables id=afddfb9d-bc1e-3068-8056-eed6c302ba09 version=b6240810-eeb7-36d5-9411-43b2d68dddab
INFO  2021-12-15 12:22:59,791 [shard 0] database - Keyspace system_schema: Reading CF keyspaces id=abac5682-dea6-31c5-b535-b3d6cffd0fb6 version=e79ca8ba-6556-3f7d-925a-7f20cf57938c
INFO  2021-12-15 12:22:59,882 [shard 0] database - Populating Keyspace system
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF compaction_history id=b4dbb7b4-dc49-3fb5-b3bf-ce6e434832ca version=25cae56a-8d75-39f3-a146-3756ab4981c7
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF batchlog id=0290003c-977e-397c-ac3e-fdfdc01d626b version=e2a2e804-49e4-3597-9f16-39fd9475835c
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF size_estimates id=618f817b-005f-3678-b8a4-53f3930b8e86 version=6d4dbcff-f05b-3dc3-95ad-f79f7b10504d
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF range_xfers id=55d76438-4e55-3f8b-9f6e-676d4af3976d version=dd15a078-409b-350d-9bef-c5f3520832d8
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF peer_events id=59dfeaea-8db2-3341-91ef-109974d81484 version=7f874a05-72b8-3c21-9acb-baa164fc351a
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF peers id=37f71aca-7dc2-383b-a706-72528af04d4f version=f6f6871f-8c86-3eca-ac0b-2a2e848e395d
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF scylla_local id=2972ec7f-fb20-38dd-aac1-d876f2e3fcbd version=5f0b407d-eedf-3845-a48e-dbb9673d10e1
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF schema_aggregates id=a5fc57fc-9d6c-3bfd-a3fc-01ad54686fea version=d06c934f-9a69-374c-86f1-523ab26fb05c
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF paxos id=b7b7f0c2-fd0a-3410-8c05-3ef614bb7c2d version=6dd372de-72fb-3e1b-9b8c-f97738a67fe9
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF compactions_in_progress id=55080ab0-5d9c-3886-90a4-acb25fe1f77b version=540263f1-40db-3869-8a38-3baadedc222d
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF local id=7ad54392-bcdd-35a6-8417-4e047860b377 version=7fa82c2e-5b67-37dd-8e5e-2079e18f1536
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF schema_columnfamilies id=45f5b360-24bc-3f83-a363-1034ea4fa697 version=7f7a5fea-07b7-304d-a6f8-f7aec72cabdb
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF hints id=2666e205-73ef-38b3-90fe-fecf96e8f0c7 version=0b74fdd1-e96d-309e-a14e-a5bcd7ac885d
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF sstable_activity id=5a1ff267-ace0-3f12-8563-cfae6103c65e version=bdec57a3-b234-334b-be9c-7b1f33113995
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF IndexInfo id=9f5c6374-d485-3229-9a0a-5094af9ad1e3 version=bbb3743b-351f-3023-b4fc-09a9be37d529
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF large_partitions id=8a7fe624-96b0-34b1-b90e-f71bddcdd2d3 version=04fa9920-9369-3a96-be39-6dd9fdc816b6
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF scylla_table_schema_history id=0191a53e-40f0-31d4-9171-b0d19ffb17b4 version=2cc890b8-9ade-3afc-ab63-043c8017608e
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF large_cells id=ead8bbc5-f146-3ae1-9f71-0b11f9a1d296 version=0d4a6937-a781-3f87-a192-b4e4a4a40acf
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF views_builds_in_progress id=b7f2c108-78cd-3c80-9cd5-d609b2bd149c version=b905cce5-1474-3085-819a-7592453e2fb9
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF built_views id=4b3c50a9-ea87-3d76-9101-6dbc9c38494a version=a28aa0b3-6def-30a7-9fbe-ce78b3f3c9b9
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF scylla_views_builds_in_progress id=a04c7bfd-1e13-36c9-a44d-f22da352281d version=e3fb736c-6956-3990-a31d-9a482279e3fc
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF truncated id=38c19fd0-fb86-3310-a4b7-0d0cc66628aa version=6da9a85c-7ae0-3917-aead-54a2f65a57a8
INFO  2021-12-15 12:22:59,882 [shard 0] database - Keyspace system: Reading CF cdc_local id=0bcaffd4-0c83-3ead-ad13-dc1d5015b77c version=0aa4d3a2-ed95-3ecd-aba0-cd75622ad290
INFO  2021-12-15 12:22:59,883 [shard 0] database - Keyspace system: Reading CF clients id=ca0f635d-8630-3609-8d93-a1fc06f2a5e5 version=51273ebf-e6f9-3cd5-b455-ea09ad29796f
INFO  2021-12-15 12:22:59,883 [shard 0] database - Keyspace system: Reading CF schema_keyspaces id=b0f22357-4458-3cdb-9631-c43e59ce3676 version=84d28cef-59f9-34a9-9be5-31c684997f03
INFO  2021-12-15 12:22:59,883 [shard 0] database - Keyspace system: Reading CF schema_columns id=296e9c04-9bec-3085-827d-c17d3df2122a version=53cfbb66-ee0d-37be-97bb-b2afb3746f85
INFO  2021-12-15 12:22:59,883 [shard 0] database - Keyspace system: Reading CF schema_triggers id=0359bc71-7123-3ee1-9a4a-b9dfb11fc125 version=75e4ec4c-76da-3f59-984e-66810ba0c62b
INFO  2021-12-15 12:22:59,883 [shard 0] database - Keyspace system: Reading CF large_rows id=40550f66-0858-39a0-9430-f27fc08034e9 version=aee3acb0-7926-317b-848e-db7bc3721695
INFO  2021-12-15 12:22:59,883 [shard 0] database - Keyspace system: Reading CF schema_usertypes id=3aa75225-4f82-350b-8d5c-430fa221fa0a version=3c53e6bb-56d5-3618-a0b2-4ff5f18127d7
INFO  2021-12-15 12:22:59,883 [shard 0] database - Keyspace system: Reading CF schema_functions id=d1b675fe-2b50-3ca4-8e49-c0f81989dcad version=a50959dd-2be0-3e4e-8bb5-cf40796e256a
INFO  2021-12-15 12:23:00,051 [shard 0] init - starting gossip
INFO  2021-12-15 12:23:00,051 [shard 0] init - seeds={10.79.14.175}, listen_address=0.0.0.0, broadcast_address=10.79.5.38
INFO  2021-12-15 12:23:00,052 [shard 0] init - starting storage proxy
INFO  2021-12-15 12:23:00,065 [shard 0] init - starting migration manager
INFO  2021-12-15 12:23:00,066 [shard 0] init - starting query processor
INFO  2021-12-15 12:23:00,067 [shard 0] init - initializing batchlog manager
INFO  2021-12-15 12:23:00,071 [shard 0] format_selector - Selected md sstables format
INFO  2021-12-15 12:23:00,073 [shard 0] legacy_schema_migrator - Moving 0 keyspaces from legacy schema tables to the new schema keyspace (system_schema)
INFO  2021-12-15 12:23:00,074 [shard 0] legacy_schema_migrator - Dropping legacy schema tables
INFO  2021-12-15 12:23:00,823 [shard 0] legacy_schema_migrator - Completed migration of legacy schema tables
INFO  2021-12-15 12:23:00,823 [shard 0] init - loading system sstables
INFO  2021-12-15 12:23:00,883 [shard 0] init - loading non-system sstables
INFO  2021-12-15 12:23:01,016 [shard 0] database - Populating Keyspace system_auth
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace system_auth: Reading CF roles id=5bc52802-de25-35ed-aeab-188eecebb090 version=fade81d9-e212-3283-9959-c3a65e505d0d
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace system_auth: Reading CF role_members id=0ecdaa87-f8fb-3e60-88d1-74fb36fe5c0d version=5ee4a0d5-8bb8-39ef-a590-acd1b398c0b6
INFO  2021-12-15 12:23:01,016 [shard 0] database - Populating Keyspace system_traces
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace system_traces: Reading CF sessions_time_idx id=0ebf001c-c1d1-3693-9a63-c3d96ac53318 version=d4b51df1-366f-3c74-ba10-5caa78d3a43a
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace system_traces: Reading CF sessions id=c5e99f16-8677-3914-b17e-960613512345 version=7aaa0938-7713-3750-b17f-0169322843be
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace system_traces: Reading CF node_slow_log_time_idx id=f9706768-aa1e-3d87-9e5c-51a3927c2870 version=2c65475d-9d9d-3393-a0a8-065f7dd310d5
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace system_traces: Reading CF node_slow_log id=bfcc4e62-5b63-3aa1-a1c3-6f5e47f3325c version=7c904397-01f6-3f95-8f35-8c6d9ed92d2a
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace system_traces: Reading CF events id=8826e8e9-e16a-3728-8753-3bc1fc713c25 version=471ab656-95c8-38fb-8533-acf853ad7865
INFO  2021-12-15 12:23:01,016 [shard 0] database - Populating Keyspace system_distributed
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace system_distributed: Reading CF view_build_status id=5582b59f-8e4e-35e1-b913-3acada51eb04 version=e61ea49b-8dbc-3ff6-a69f-896f4d38b7bb
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace system_distributed: Reading CF cdc_streams_descriptions_v2 id=0bf73fd7-65b2-36b0-85e5-658131d5df36 version=58a18c32-0c5a-34ba-992f-7723dd0265c0
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace system_distributed: Reading CF cdc_generation_timestamps id=fdf455c4-cfec-3e00-9719-d7a45436c89d version=5777e043-a81a-3324-8803-b497c056849f
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace system_distributed: Reading CF cdc_generation_descriptions id=ae653406-7b3e-34aa-af70-d5b90f837c67 version=95e9e159-839f-39b3-990f-8e8a04817ff8
INFO  2021-12-15 12:23:01,016 [shard 0] database - Populating Keyspace janusgraph
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace janusgraph: Reading CF txlog id=83a0c050-f740-11eb-a0d8-000000000002 version=c2b299e9-fc99-3b83-bef2-7991715873b2
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace janusgraph: Reading CF systemlog id=843fe680-f740-11eb-9393-000000000000 version=ed61da6a-b2a1-3234-8184-20fb5d8f75d6
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace janusgraph: Reading CF system_properties_lock_ id=84c93ac0-f740-11eb-9946-000000000000 version=f0e2d838-fde6-3e09-95c3-4adc440b0f21
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace janusgraph: Reading CF system_properties id=7c346bf0-f740-11eb-8256-000000000000 version=7a88c300-098f-3d0d-a2e0-56523d97b937
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace janusgraph: Reading CF janusgraph_ids id=7f983060-f740-11eb-9393-000000000000 version=18019852-e216-3b72-a2a7-1ef07fcd7cbf
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace janusgraph: Reading CF graphindex_lock_ id=82f9d1f0-f740-11eb-9946-000000000000 version=9eaf4bdd-e4b7-3882-9cca-e29f2b5ee3d5
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace janusgraph: Reading CF graphindex id=821cdf70-f740-11eb-9393-000000000000 version=9ffe6996-2608-3832-9c1c-8c26a021b8c8
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace janusgraph: Reading CF edgestore_lock_ id=813c1c60-f740-11eb-a0d8-000000000002 version=8946afd1-f0ba-3805-abd9-438feeab26e5
INFO  2021-12-15 12:23:01,016 [shard 0] database - Keyspace janusgraph: Reading CF edgestore id=8099e8f0-f740-11eb-9946-000000000000 version=f7a080b1-02ed-3c3b-995d-0d2a6394aea6
INFO  2021-12-15 12:23:01,129 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207249.sstable", removing
INFO  2021-12-15 12:23:01,129 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207269.sstable", removing
INFO  2021-12-15 12:23:01,129 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207268.sstable", removing
INFO  2021-12-15 12:23:01,130 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207265.sstable", removing
INFO  2021-12-15 12:23:01,130 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207263.sstable", removing
INFO  2021-12-15 12:23:01,130 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207266.sstable", removing
INFO  2021-12-15 12:23:01,130 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207253.sstable", removing
INFO  2021-12-15 12:23:01,130 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207255.sstable", removing
INFO  2021-12-15 12:23:01,130 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207258.sstable", removing
INFO  2021-12-15 12:23:01,130 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207251.sstable", removing
INFO  2021-12-15 12:23:01,130 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207254.sstable", removing
INFO  2021-12-15 12:23:01,130 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207272.sstable", removing
INFO  2021-12-15 12:23:01,130 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207261.sstable", removing
INFO  2021-12-15 12:23:01,130 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207252.sstable", removing
INFO  2021-12-15 12:23:01,130 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207267.sstable", removing
INFO  2021-12-15 12:23:01,130 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207256.sstable", removing
INFO  2021-12-15 12:23:01,130 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207260.sstable", removing
INFO  2021-12-15 12:23:01,130 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207257.sstable", removing
INFO  2021-12-15 12:23:01,130 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207262.sstable", removing
INFO  2021-12-15 12:23:01,131 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207264.sstable", removing
INFO  2021-12-15 12:23:01,131 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207275.sstable", removing
INFO  2021-12-15 12:23:01,131 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207274.sstable", removing
INFO  2021-12-15 12:23:01,131 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207259.sstable", removing
INFO  2021-12-15 12:23:01,131 [shard 0] database - Found temporary sstable directory: "/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/0000000000207250.sstable", removing
Traceback (most recent call last):
  File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 208, in <module>
    args.func(args)
  File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 134, in check_version
    current_version = sanitize_version(get_api('/storage_service/scylla_release_version'))
  File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 93, in get_api
    return get_json_from_url("http://" + api_address + path)
  File "/opt/scylladb/scripts/libexec/scylla-housekeeping", line 88, in get_json_from_url
    raise RuntimeError(f'Failed to get "{path}" due to the following error: {retval}')
RuntimeError: Failed to get "http://localhost:10000/storage_service/scylla_release_version" due to the following error: HTTP Error 404: Not Found
INFO  2021-12-15 12:23:04,885 [shard 0] database - Resharding 213GB 
INFO  2021-12-15 12:23:04,888 [shard 2] compaction - [Reshard janusgraph.edgestore c0e34380-5da1-11ec-8a2f-000000000000] Resharding [/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-13807-big-Data.db:level=0, ]
INFO  2021-12-15 12:23:04,888 [shard 6] compaction - [Reshard janusgraph.edgestore c0e34380-5da1-11ec-a4f0-000000000002] Resharding [/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-206387-big-Data.db:level=0, /var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-171933-big-Data.db:level=0, /var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-207213-big-Data.db:level=0, /var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-207244-big-Data.db:level=0, /var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-206828-big-Data.db:level=0, /var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-207227-big-Data.db:level=0, ]
INFO  2021-12-15 12:23:04,888 [shard 3] compaction - [Reshard janusgraph.edgestore c0e34380-5da1-11ec-9870-000000000001] Resharding [/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-13083-big-Data.db:level=0, ]
INFO  2021-12-15 12:23:04,888 [shard 0] compaction - [Reshard janusgraph.edgestore c0e34380-5da1-11ec-908a-000000000003] Resharding [/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-199245-big-Data.db:level=0, ]
INFO  2021-12-15 12:23:04,888 [shard 5] compaction - [Reshard janusgraph.edgestore c0e34380-5da1-11ec-ac7f-000000000004] Resharding [/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-12686-big-Data.db:level=0, /var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-207209-big-Data.db:level=0, /var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-206876-big-Data.db:level=0, /var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-207175-big-Data.db:level=0, /var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-207108-big-Data.db:level=0, /var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-207228-big-Data.db:level=0, /var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-207097-big-Data.db:level=0, /var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-207220-big-Data.db:level=0, /var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-207211-big-Data.db:level=0, /var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-207212-big-Data.db:level=0, /var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-207216-big-Data.db:level=0, /var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-207146-big-Data.db:level=0, ]
INFO  2021-12-15 12:23:04,889 [shard 1] compaction - [Reshard janusgraph.edgestore c0e36a90-5da1-11ec-a3e3-000000000005] Resharding [/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-13279-big-Data.db:level=0, ]
INFO  2021-12-15 12:23:04,890 [shard 4] compaction - [Reshard janusgraph.edgestore c0e391a0-5da1-11ec-83a1-000000000006] Resharding [/var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-12691-big-Data.db:level=0, /var/lib/scylla/data/janusgraph/edgestore-8099e8f0f74011eb9946000000000000/md-206930-big-Data.db:level=0, ]
E1215 12:23:05.937699       1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="agent [HTTP 404] Not found" Service="scylla-mumbai/scylla-mumbai-asia-south1-asia-south1-a-0"
E1215 12:23:15.929403       1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="agent [HTTP 404] Not found" Service="scylla-mumbai/scylla-mumbai-asia-south1-asia-south1-a-0"
E1215 12:23:25.930657       1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="agent [HTTP 404] Not found" Service="scylla-mumbai/scylla-mumbai-asia-south1-asia-south1-a-0"
E1215 12:23:35.928204       1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="agent [HTTP 404] Not found" Service="scylla-mumbai/scylla-mumbai-asia-south1-asia-south1-a-0"
E1215 12:23:45.930260       1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="agent [HTTP 404] Not found" Service="scylla-mumbai/scylla-mumbai-asia-south1-asia-south1-a-0"
E1215 12:23:55.929265       1 sidecar/probes.go:172] "healthz probe: can't connect to JMX" err="agent [HTTP 404] Not found" Service="scylla-mumbai/scylla-mumbai-asia-south1-asia-south1-a-0"

Environment:

tnozicka commented 2 years ago

any chance the recent probe fix helps, or is it unrelated?

tnozicka commented 2 years ago

@vponomaryov fyi, this needs QA coverage

zimnx commented 2 years ago

any chance the recent probe fix helps, or is it unrelated?

No, Scylla API is incrementally adding new endpoints during startup and it looks like our liveness probe is using endpoint which is not available at the resharding state.