Open tgrabiec opened 10 months ago
But with tablets, shard may change in between if tablet is migrated. This can cause std::bad_variant_access to be thrown.
Did you see it happening? Doesn't migration suppose to wait for all ongoing requests to complete before moving? And last but not least do we support moving a tablet from shard to shard on the same node?
But with tablets, shard may change in between if tablet is migrated. This can cause std::bad_variant_access to be thrown.
Did you see it happening?
No.
Doesn't migration suppose to wait for all ongoing requests to complete before moving?
This jumping happens in the CQL layer where we don't hold erm around it, so it can escape barriers.
And last but not least do we support moving a tablet from shard to shard on the same node?
Not yet, but we will. But even now, it's not guaranteed that the CQL coordinator runs on the tablet-owning node, so it can migrate into the coordinator node.
But with tablets, shard may change in between if tablet is migrated. This can cause std::bad_variant_access to be thrown.
Doesn't migration suppose to wait for all ongoing requests to complete before moving?
This jumping happens in the CQL layer where we don't hold erm around it, so it can escape barriers.
That's true. But we can fix it. For instance we can put erm into bonce error struct.
And last but not least do we support moving a tablet from shard to shard on the same node?
Not yet, but we will. But even now, it's not guaranteed that the CQL coordinator runs on the tablet-owning node, so it can migrate into the coordinator node.
We run drain on all nodes.
Unfortunately, Alternator may have exactly the same problem as CQL because it (see alternator/executor.cc
) also uses cas_shard()
and forwards it to this shard. So whatever solution is used for CQL (I see there's an open PR #17309 by @bhalevy that claims to Fixes this issue), we'll need to do it for Alternator as well. I'll open a separate issue for that.
But with tablets, shard may change in between if tablet is migrated. This can cause std::bad_variant_access to be thrown.
What if not only the shard changes - the tablet was entirely moved off this node, and the node doesn't own it any more? What will happen then? In the past, I believed (maybe I misunderstood?) that it was fine to run an LWT request on a coordinator node that doesn't own a replica of the data, as long as it was run on the correct shard (the "correct" shard was a function of the token, even if this node has no replica for this token). Is this no longer true? Is it now forbidden to run an LWT request on a coordinator node that doesn't contain a replica of the involved tablet?
To emphasize my last question: Is it possible that with tablets, non-topology-aware drivers (which send the request to a random node, not one known to host the relevant tablet) fail with LWT? Opened an Alternator version of this issue in https://github.com/scylladb/scylladb/issues/17399. In Alternator, the question of non-topology-aware drivers is even more critical, because Alternator is never topology-aware: Requests arrive in arbitrary nodes, not necessarily the right nodes (let alone the right shards).
On Mon, Feb 19, 2024 at 1:22 PM nyh @.***> wrote:
To emphasize my last question: Is it possible that with tablets, non-topology-aware drivers (which send the request to a random node, not one known to host the relevant tablet) fail with LWT? Opened an Alternator version of this issue in #17399 https://github.com/scylladb/scylladb/issues/17399. In Alternator, the question of non-topology-aware drivers is even more critical, because Alternator is never topology-aware: Requests arrive in arbitrary nodes, not necessarily the right nodes (let alone the right shards).
We develop with the goal in mind that drivers don't have to be tablet-aware for correctness, only performance.
Whether LWT depends on it for correctness, I'm not sure. Why do we require the LWT coordinator to run on the key-owning shard @Gleb Natapov @.***> ?
Message ID: @.***>
On Mon, Feb 19, 2024 at 12:03 PM Gleb Natapov @.***> wrote:
But with tablets, shard may change in between if tablet is migrated. This can cause std::bad_variant_access to be thrown.
Doesn't migration suppose to wait for all ongoing requests to complete before moving?
This jumping happens in the CQL layer where we don't hold erm around it, so it can escape barriers.
That's true. But we can fix it. For instance we can put erm into bonce error struct.
True, but there's no mechanism for passing erm ptr across shards yet. Maybe Benny's approach to retry in this rare event is good enough?
And last but not least do we support moving a tablet from shard to shard on the same node?
Not yet, but we will. But even now, it's not guaranteed that the CQL coordinator runs on the tablet-owning node, so it can migrate into the coordinator node.
We run drain on all nodes.
That would help if we held erm ptr in the CQL layer around the whole operation.
Message ID: @.***>
Why do we require the LWT coordinator to run on the key-owning shard
We introduced bounce for efficiency, but since with had bounce we could guaranty that the storage proxy code runs on the correct shard. I cannot guaranty that it will run correctly (but less efficient) if executed on different shard (hence the check). There was not such requirement when the code was written and there was no tests running in such setup. FWIW I think that thing that should be fixed is tablet locking moving up. Tablets have to try hard to preserve "same shard" property for data for performance.
An alternative to checking that shards don't change, we can hold effective_replication_map_ptr and so prevent migrations.
Is this still a work item for 6.0?
We decided to not support LWT in 6.0, so this is not necessary.
@kostja ^^^
@tgrabiec title has no mention of LWT; no test case; the code that is in the master today has no check for the tablet sharder, so will be quietly misbehaving in rare cases. An exception needs to be added to the current code and a documentation entry needs to be provided about the limitations if we want to ship.
And frankly I don't see how we can ship tablets as the default sharder without LWT :/
And frankly I don't see how we can ship tablets as the default sharder without LWT :/
I thought this was already decided (by @avikivity) and this is why starting with https://github.com/scylladb/scylladb/pull/17318 we warn on every keyspace creation that you may want to re-create the keyspace without tablets if you want to use CDC or LWT. I'm not happy about this decision either - it also means Alternator will not really work correctly with tablets, although it does enable tablets by default.
@tgrabiec is this issue a release blocker with https://github.com/scylladb/scylladb/pull/18026?
Given https://github.com/scylladb/scylladb/issues/18066, which will prevent us from entering this logic altogether, I think this can be moved to 6.1
But https://github.com/scylladb/scylladb/issues/18066 should be a blocker
@tgrabiec is this issue a release blocker with #18026?
@tgrabiec title has no mention of LWT;
The description mentions it though. We also have a label. Isn't that enough?
no test case;
We don't require adding test cases when opening issues. If we did, then opening issues would be too hard. There's value in opening the issue because we can at least track the problem and decide what to do with it. Like we do now.
Shard bouncing (used for LWT requests), is handled like this:
It assumes that shard which we got in the bounce response will be the correct shard the second time:
But with tablets, shard may change in between if tablet is migrated. This can cause std::bad_variant_access to be thrown.