Closed bhalevy closed 6 months ago
@nyh, @fruch please review
Seems o.k.
But I'll want to take a ride in at least in gating
@bhalevy
giving it a go:
@bhalevy
giving it a go:
- [ ] https://jenkins.scylladb.com/job/scylla-staging/job/dtest-pytest-gating/134/ (OSS)
- [ ] https://jenkins.scylladb.com/job/scylla-staging/job/dtest-pytest-gating/135/ (OSS + raft)
- [ ] https://jenkins.scylladb.com/job/scylla-staging/job/dtest-pytest-gating/136/ (Enterprise)
- [ ] https://jenkins.scylladb.com/job/scylla-staging/job/dtest-pytest-gating/137/ (Enterprise + raft)
@bhalevy seems like we are getting None as host_id all over the place
@bhalevy seems like we are getting None as host_id all over the place
should be fixed now (at least it passes for me locally, for example update_cluster_layout_tests.py::TestUpdateClusterLayout::test_simple_removenode_3[api-True]
)
@bhalevy seems like we are getting None as host_id all over the place
should be fixed now (at least it passes for me locally, for example
update_cluster_layout_tests.py::TestUpdateClusterLayout::test_simple_removenode_3[api-True]
)
running gating again:
so to force reload of the host_id after restart as the node may wake up with a different host_id (after being wiped)
In 54332a9:
giving one last run, this is some that gonna happen on every single test:
@bhalevy also keep in mind it might break non gating tests... we'll need to keep that in mind following this merge
@bhalevy also keep in mind it might break non gating tests... we'll need to keep that in mind following this merge
Right. That's why I added the wait_normal_token_owner
flag.
@bhalevy also keep in mind it might break non gating tests... we'll need to keep that in mind following this merge
Right. That's why I added the
wait_normal_token_owner
flag.
And which tests are gonna use that ? Only one we will be unstable?
@bhalevy also keep in mind it might break non gating tests... we'll need to keep that in mind following this merge
Right. That's why I added the
wait_normal_token_owner
flag.And which tests are gonna use that ? Only one we will be unstable?
Yes. This category of tests monitor starting nodes' log looking for particular events.
In most cases we already call start with wait_other_notice=False, wait_for_binary_protocol=False
, but in cases we didn't do that we'll have to figure out what's the right thing to do.
giving one last run, this is some that gonna happen on every single test:
- [ ] https://jenkins.scylladb.com/job/scylla-staging/job/dtest-pytest-gating/143/ (OSS)
- [ ] https://jenkins.scylladb.com/job/scylla-staging/job/dtest-pytest-gating/144/ (OSS + raft)
- [ ] https://jenkins.scylladb.com/job/scylla-staging/job/dtest-pytest-gating/145/ (Enterprise)
- [ ] https://jenkins.scylladb.com/job/scylla-staging/job/dtest-pytest-gating/146/ (Enterprise + raft)
There's only https://jenkins.scylladb.com/job/scylla-staging/job/dtest-pytest-gating/146/testReport/junit/cql_tests/TestTruncate/FullDtest___full_split000___test_cql_query_filtering_without_indexes/ which seems unrelated. Need to analyse it further
This is a test issue: https://github.com/scylladb/scylla-dtest/issues/3717
@nyh since you're the last to change watch_rest_for_alive
, any comments before we merge this change?
@nyh please review (since you were the last one to change this area)
In 5fb352a:
pass
statementtofind
as nodes are foundfound
set and found_host_id_map
dict for debugging purposeswait_normal_token_owner
param also in watch_rest_for_alive
@nyh please re-review. I addressed your review comments.
@nyh please re-review. I addressed your review comments.
@nyh ping, when do you think you can re-review?
All checks have passed, please merge
@fruch can you please backport to 5.4 and 2024.1?
This kinda reminds a (very) long discussion with the operator team, where they did not know when can they safely do a rolling restart of Scylla pods - and the request was a simple API to let them know 'yes, it's real, this node is really up and serving traffic and is part of the cluster and you can safely move to the next node and restart it'. See https://github.com/scylladb/scylladb/issues/8275
It is not enough for node to know about other nodes' tokens, as they might not be reflected in the token_metadata map.
Instead, check the
/storage_service/host_id
api that provides a list of nodes that are normal token owners and ready to be used by queries.Refs https://github.com/scylladb/scylladb/issues/15146