scylladb / python-driver

ScyllaDB Python Driver, originally DataStax Python Driver for Apache Cassandra
https://python-driver.docs.scylladb.com
Apache License 2.0
74 stars 42 forks source link

test_shard_aware.py::test_closing_connections sometimes fails. #221

Open Lorak-mmk opened 1 year ago

Lorak-mmk commented 1 year ago

This test sometimes fails the 'querying locally' assert - Scylla seems to sometimes send query to a different node, despite being the right node to execute the query. In such case 'querying locally' won't be present in tracing messages. This was explained by @kbr-scylla as expected behavior - part of heat weighted load balancing.

This assert most likely needs to be removed, or replaced by some other check - more investigation is required. Java driver also uses the same assert, so it'll probably start failing too when more complicated shard awareness tests are added.

However, after removing offending asserts, the tests still sometimes fails - some tracing entries are from different shard. The cause is unknown to me - it may be expected Scylla behavior, a bug in Scylla or a bug in driver.

Example tracing:

2023-04-13 15:36:55.451 DEBUG [test_shard_aware:120]: Using connection to shard_id=0 on host 127.0.0.2:9042 for routing_key=b'a'
2023-04-13 15:36:55.456 DEBUG [test_shard_aware:60]: Attempting to fetch trace info for trace ID: 41a4adc0-da00-11ed-b5e5-61df36b30ed2
2023-04-13 15:36:55.460 DEBUG [test_shard_aware:60]: Fetched trace info for trace ID: 41a4adc0-da00-11ed-b5e5-61df36b30ed2
2023-04-13 15:36:55.460 DEBUG [test_shard_aware:60]: Attempting to fetch trace events for trace ID: 41a4adc0-da00-11ed-b5e5-61df36b30ed2
2023-04-13 15:36:55.464 DEBUG [test_shard_aware:60]: Fetched trace events for trace ID: 41a4adc0-da00-11ed-b5e5-61df36b30ed2
2023-04-13 15:36:55.465 INFO [test_shard_aware:63]: 127.0.0.2 shard 1 Checking bounds
2023-04-13 15:36:55.465 INFO [test_shard_aware:63]: 127.0.0.2 shard 1 Processing a statement
2023-04-13 15:36:55.465 INFO [test_shard_aware:63]: 127.0.0.2 shard 1 Creating read executor for token -8839064797231613815 with all: {127.0.0.2, 127.0.0.3, 127.0.0.1} targets: {127.0.0.1} repair decision: NONE
2023-04-13 15:36:55.465 INFO [test_shard_aware:63]: 127.0.0.2 shard 1 read_data: sending a message to /127.0.0.1
2023-04-13 15:36:55.465 INFO [test_shard_aware:63]: 127.0.0.1 shard 1 read_data: message received from /127.0.0.2
2023-04-13 15:36:55.465 INFO [test_shard_aware:63]: 127.0.0.1 shard 0 Start querying singular range {{-8839064797231613815, pk{000161}}}
2023-04-13 15:36:55.465 INFO [test_shard_aware:63]: 127.0.0.1 shard 0 Querying cache for range {{-8839064797231613815, pk{000161}}} and slice {{ckp{000162}}}
2023-04-13 15:36:55.466 INFO [test_shard_aware:63]: 127.0.0.1 shard 0 Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead) and 0 range tombstone(s)
2023-04-13 15:36:55.466 INFO [test_shard_aware:63]: 127.0.0.1 shard 0 Querying is done
2023-04-13 15:36:55.466 INFO [test_shard_aware:63]: 127.0.0.2 shard 1 read_data: sending a message to /127.0.0.3
2023-04-13 15:36:55.466 INFO [test_shard_aware:63]: 127.0.0.1 shard 1 read_data handling is done, sending a response to /127.0.0.2
2023-04-13 15:36:55.466 INFO [test_shard_aware:63]: 127.0.0.3 shard 1 read_data: message received from /127.0.0.2
2023-04-13 15:36:55.466 INFO [test_shard_aware:63]: 127.0.0.3 shard 0 Start querying singular range {{-8839064797231613815, pk{000161}}}
2023-04-13 15:36:55.466 INFO [test_shard_aware:63]: 127.0.0.3 shard 0 Querying cache for range {{-8839064797231613815, pk{000161}}} and slice {{ckp{000162}}}
2023-04-13 15:36:55.466 INFO [test_shard_aware:63]: 127.0.0.3 shard 0 Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead) and 0 range tombstone(s)
2023-04-13 15:36:55.467 INFO [test_shard_aware:63]: 127.0.0.3 shard 0 Querying is done
2023-04-13 15:36:55.467 INFO [test_shard_aware:63]: 127.0.0.2 shard 1 read_data: got response from /127.0.0.1
2023-04-13 15:36:55.467 INFO [test_shard_aware:63]: 127.0.0.2 shard 1 Done processing - preparing a result
2023-04-13 15:36:55.467 INFO [test_shard_aware:63]: 127.0.0.3 shard 1 read_data handling is done, sending a response to /127.0.0.2
2023-04-13 15:36:55.467 INFO [test_shard_aware:63]: 127.0.0.2 shard 1 read_data: got response from /127.0.0.3