Closed vponomaryov closed 5 days ago
node-1 (the one being upgraded)
INFO 2023-01-04 17:37:22,424 [shard 0] cql_server_controller - Starting listening for CQL clients on 0.0.0.0:9042 (unencrypted, non-shard-aware)
INFO 2023-01-04 17:37:22,424 [shard 0] cql_server_controller - Starting listening for CQL clients on 0.0.0.0:19042 (unencrypted, shard-aware)
node-2, update the schema:
INFO 2023-01-04 17:37:31,453 [shard 0] schema_tables - Schema version changed to 8af28221-bae0-35a1-bd3c-7bb3a7caf720
node-1, notice the other nodes 2min after, and get the new schema from them:
INFO 2023-01-04 17:38:26,725 [shard 0] gossip - InetAddress 10.112.2.191 is now UP, status = NORMAL
INFO 2023-01-04 17:38:26,726 [shard 0] gossip - InetAddress 10.112.8.121 is now UP, status = NORMAL
INFO 2023-01-04 17:38:26,727 [shard 0] storage_service - Node 10.112.2.191 state jump to normal
INFO 2023-01-04 17:38:26,731 [shard 0] storage_service - Node 10.112.8.121 state jump to normal
...
INFO 2023-01-04 17:39:26,726 [shard 0] migration_manager - Requesting schema pull from 10.112.2.191:0
INFO 2023-01-04 17:39:26,726 [shard 0] migration_manager - Pulling schema from 10.112.2.191:0
INFO 2023-01-04 17:39:26,726 [shard 0] migration_manager - Requesting schema pull from 10.112.8.121:0
INFO 2023-01-04 17:39:26,726 [shard 0] migration_manager - Pulling schema from 10.112.8.121:0
INFO 2023-01-04 17:39:26,833 [shard 0] schema_tables - Altering keyspace_fill_db_data.table_options_test id=6e04e400-8c50-11ed-8fbc-394aebb27b6e version=9373d136-8b14-33a9-9d8b-191e567e7e6b
INFO 2023-01-04 17:39:26,834 [shard 0] schema_tables - Altering keyspace_fill_db_data.table_options_test_scylla_cdc_log id=6e04e402-8c50-11ed-8fbc-394aebb27b6e version=e1098738-72f1-347f-805c-454472f91653
...
INFO 2023-01-04 17:39:26,862 [shard 0] schema_tables - Schema version changed to 8af28221-bae0-35a1-bd3c-7bb3a7caf720
INFO 2023-01-04 17:39:26,863 [shard 0] migration_manager - Schema merge with 10.112.2.191:0 completed
INFO 2023-01-04 17:39:27,078 [shard 0] schema_tables - Schema version changed to 8af28221-bae0-35a1-bd3c-7bb3a7caf720
@vponomaryov, I think this might be an k8s related issue, and we'll need @scylladb/team-operator to take a closer look here.
@fruch Since the https://github.com/orgs/scylladb/teams/team-operator doesn't have members yet, need to mention people explicitly: @tnozicka , @zimnx , @rzetelskik Please, look at it.
@fruch why is it marked as master/triage?
@fruch why is it marked as master/triage?
It was a suspected core issue, seems like it's not the case.
It was a suspected core issue, seems like it's not the case.
Why do you think it's k8s related?
What's the condition you wait for before you issue an insert?
It was a suspected core issue, seems like it's not the case.
Why do you think it's k8s related?
What's the condition you wait for before you issue an insert?
we are waiting like that:
def wait_till_scylla_is_upgraded_on_all_nodes(self, target_version: str) -> None:
def _is_cluster_upgraded() -> bool:
for node in self.db_cluster.nodes:
node.forget_scylla_version()
if node.scylla_version != target_version or not node.db_up:
return False
return True
wait.wait_for(
func=_is_cluster_upgraded,
step=30,
text="Waiting until all nodes in the cluster are upgraded",
timeout=900,
throw_exc=True,
)
that the version is what we except, and the the CQL port is open.
what else should we need to wait for before using the cluster ?
In my view, you should look at ScyllaCluster.Status.Conditions - Available=True,Progressing=False,Degraded=False
.
Not keeping quorum throught rollouts it's a known issue on k8s - https://github.com/scylladb/scylla-operator/issues/1077
In my view, you should look at ScyllaCluster.Status.Conditions -
Available=True,Progressing=False,Degraded=False
.
We will look at checking this status as well
Not keeping quorum throught rollouts it's a known issue on k8s - scylladb/scylla-operator#1077
@mykaul is it's agreed it's a operator issue, can you help us move it there ?
@zimnx seems like there's some strong arguments with the suggest solution for https://github.com/scylladb/scylla-operator/issues/1077, is there still moving forward ?
@fruch #1077 is waiting for the input and reviews from the rest of the team in #1108
The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
/lifecycle stale
The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
/lifecycle rotten
The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
/close not-planned
@scylla-operator-bot[bot]: Closing this issue, marking it as "Not Planned".
Issue description
Describe your issue in detail and steps it took to produce it.
Impact
User cannot perform some queries.
How frequently does it reproduce?
It was reproduced 2 times from 2.
Installation details
Kernel Version: 5.15.0-1020-gke Scylla version (or git commit hash):
5.0.5-20221009.5a97a1060
with build-id5009658b834aaf68970135bfc84f964b66ea4dee
Relocatable Package: http://downloads.scylladb.com/downloads/scylla/relocatable/scylladb-5.1/scylla-x86_64-package-5.1.2.0.20221225.4c0f7ea09893.tar.gz Operator Image: scylladb/scylla-operator:1.8.0-rc.0 Operator Helm Version: 1.8.0-rc.0 Operator Helm Repository: https://storage.googleapis.com/scylla-operator-charts/latest Cluster size: 3 nodes (n1-standard-8)Scylla Nodes used in this run: No resources left at the end of the run
OS / Image:
N/A
(k8s-gke: us-east1-b)Test:
upgrade-major-scylla-k8s-gke
Test id:207bdbdc-673c-4c52-ac37-44faddabe464
Test name:scylla-operator/operator-1.8/upgrade/upgrade-major-scylla-k8s-gke
Test config file(s):<details> <summary>
Running Scylla upgrade from
5.0.5-0.20221009.5a97a1060 with build-id 5009658b834aaf68970135bfc84f964b66ea4dee
to5.1.2-0.20221225.4c0f7ea09893 with build-id 4817fe236d57eca203f35b1dbb4bfe43cab72590
on K8S backend (GKE) we faced following problem:Logs with error:
We run lots of commands, but the same one failed in the same place in 2 different test runs.
And second test run was using enterprise Scylla upgrading from the
2021.1.17-0.20221221.5318a7fec with build-id d4378bd13d179b4bbcde7bdc82b92d8cc71c52d8
to the2022.1.3-0.20220922.539a55e35 with build-id d1fb2faafd95058a04aad30b675ff7d2b930278d
version.</summary>
$ hydra investigate show-monitor 207bdbdc-673c-4c52-ac37-44faddabe464
$ hydra investigate show-logs 207bdbdc-673c-4c52-ac37-44faddabe464
Logs:
Jenkins job URL </details>